PRISM: Pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants

被引:89
作者
Jiang, Yue [1 ,2 ]
Wang, Yadong [1 ]
Brudno, Michael [2 ,3 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Ctr Biomed Informat, Harbin 150001, Heilongjiang, Peoples R China
[2] Univ Toronto, Donnelly Ctr, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
[3] Hosp Sick Children, Ctr Computat Med, Toronto, ON M5G 1X8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
SAVANT GENOME BROWSER; COPY NUMBER VARIATION; EXACT BREAKPOINTS; SEQUENCE; ACCURATE;
D O I
10.1093/bioinformatics/bts484
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The development of high-throughput sequencing technologies has enabled novel methods for detecting structural variants (SVs). Current methods are typically based on depth of coverage or pair-end mapping clusters. However, most of these only report an approximate location for each SV, rather than exact breakpoints. Results: We have developed pair-read informed split mapping (PRISM), a method that identifies SVs and their precise breakpoints from whole-genome resequencing data. PRISM uses a split-alignment approach informed by the mapping of paired-end reads, hence enabling breakpoint identification of multiple SV types, including arbitrary-sized inversions, deletions and tandem duplications. Comparisons to previous datasets and simulation experiments illustrate PRISM's high sensitivity, while PCR validations of PRISM results, including previously uncharacterized variants, indicate an overall precision of similar to 90%.
引用
收藏
页码:2576 / 2583
页数:8
相关论文
共 20 条
  • [1] CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing
    Abyzov, Alexej
    Urban, Alexander E.
    Snyder, Michael
    Gerstein, Mark
    [J]. GENOME RESEARCH, 2011, 21 (06) : 974 - 984
  • [2] Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/NMETH.1363, 10.1038/nmeth.1363]
  • [3] Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS
    Emde, Anne-Katrin
    Schulz, Marcel H.
    Weese, David
    Sun, Ruping
    Vingron, Martin
    Kalscheuer, Vera M.
    Haas, Stefan A.
    Reinert, Knut
    [J]. BIOINFORMATICS, 2012, 28 (05) : 619 - 627
  • [4] Savant Genome Browser 2: visualization and analysis for population-scale genomics
    Fiume, Marc
    Smith, Eric J. M.
    Brook, Andrew
    Strbenac, Dario
    Turner, Brian
    Mezlini, Aziz M.
    Robinson, Mark D.
    Wodak, Shoshana J.
    Brudno, Michael
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) : W615 - W621
  • [5] Savant: genome browser for high-throughput sequencing data
    Fiume, Marc
    Williams, Vanessa
    Brook, Andrew
    Brudno, Michael
    [J]. BIOINFORMATICS, 2010, 26 (16) : 1938 - 1944
  • [6] Karakoc E, 2012, NAT METHODS, V9, P176, DOI [10.1038/nmeth.1810, 10.1038/NMETH.1810]
  • [7] Mapping and sequencing of structural variation from eight human genomes (Reprinted from Nature, vol 453, pg 56-64, 2008)
    Kidd, Jeffrey M.
    Cooper, Gregory M.
    Donahue, William F.
    Hayden, Hillary S.
    Sampas, Nick
    Graves, Tina
    Hansen, Nancy
    Teague, Brian
    Alkan, Can
    Antonacci, Francesca
    Haugen, Eric
    Zerr, Troy
    Yamada, N. Alice
    Tsang, Peter
    Newman, Tera L.
    Tuzun, Eray
    Cheng, Ze
    Ebling, Heather M.
    Tusneem, Nadeem
    David, Robert
    Gillett, Will
    Phelps, Karen A.
    Weaver, Molly
    Saranga, David
    Brand, Adrianne
    Tao, Wei
    Gustafson, Erik
    McKernan, Kevin
    Chen, Lin
    Malig, Maika
    Smith, Joshua D.
    Korn, Joshua M.
    McCarroll, Steven A.
    Altshuler, David A.
    Peiffer, Daniel A.
    Dorschner, Michael
    Stamatoyannopoulos, John
    Schwartz, David
    Nickerson, Deborah A.
    Mullikin, James C.
    Wilson, Richard K.
    Bruhn, Laurakay
    Olson, Maynard V.
    Kaul, Rajinder
    Smith, Douglas R.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2009, : S22 - S30
  • [8] Li H, 2009, BIOINFORMATICS, V25, P1094, DOI [10.1093/bioinformatics/btp324, 10.1093/bioinformatics/btp100]
  • [9] McCarroll S. A., 2007, NAT GENET, V40, P1166
  • [10] The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data
    McKenna, Aaron
    Hanna, Matthew
    Banks, Eric
    Sivachenko, Andrey
    Cibulskis, Kristian
    Kernytsky, Andrew
    Garimella, Kiran
    Altshuler, David
    Gabriel, Stacey
    Daly, Mark
    DePristo, Mark A.
    [J]. GENOME RESEARCH, 2010, 20 (09) : 1297 - 1303