Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery

被引:150
作者
Hormozdiari, Fereydoun [2 ]
Hajirasouliha, Iman [2 ]
Dao, Phuong [2 ]
Hach, Faraz [2 ]
Yorukoglu, Deniz [2 ]
Alkan, Can [1 ,3 ]
Eichler, Evan E. [1 ,3 ]
Sahinalp, S. Cenk [2 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[3] Howard Hughes Med Inst, Seattle, WA USA
基金
加拿大自然科学与工程研究理事会; 美国国家卫生研究院;
关键词
STRUCTURAL VARIATION; HUMAN GENOME; COPY NUMBER; ELEMENTS; FRAMEWORK; VARIANTS;
D O I
10.1093/bioinformatics/btq216
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Recent years have witnessed an increase in research activity for the detection of structural variants (SVs) and their association to human disease. The advent of next-generation sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified by the 1000 Genomes Project. Although various computational methods have been described for the detection of SVs, no such algorithm is yet fully capable of discovering transposon insertions, a very important class of SVs to the study of human evolution and disease. In this article, we provide a complete and novel formulation to discover both loci and classes of transposons inserted into genomes sequenced with high-throughput sequencing technologies. In addition, we also present 'conflict resolution' improvements to our earlier combinatorial SV detection algorithm (VariationHunter) by taking the diploid nature of the human genome into consideration. We test our algorithms with simulated data from the Venter genome (HuRef) and are able to discover >85% of transposon insertion events with precision of >90%. We also demonstrate that our conflict resolution algorithm (denoted as VariationHunter-CR) outperforms current state of the art (such as original VariationHunter, BreakDancer and MoDIL) algorithms when tested on the genome of the Yoruba African individual (NA18507). Availability: The implementation of algorithm is available at http://compbio.cs.sfu.ca/strvar.htm. Contact: eee@gs.washington.edu; cenk@cs.sfu.ca Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:i350 / i357
页数:8
相关论文
共 28 条
[1]   Personalized copy number and segmental duplication maps using next-generation sequencing [J].
Alkan, Can ;
Kidd, Jeffrey M. ;
Marques-Bonet, Tomas ;
Aksay, Gozde ;
Antonacci, Francesca ;
Hormozdiari, Fereydoun ;
Kitzman, Jacob O. ;
Baker, Carl ;
Malig, Maika ;
Mutlu, Onur ;
Sahinalp, S. Cenk ;
Gibbs, Richard A. ;
Eichler, Evan E. .
NATURE GENETICS, 2009, 41 (10) :1061-U29
[2]   An Alu transposition model for the origin and expansion of human segmental duplications [J].
Bailey, JA ;
Liu, G ;
Eichler, EE .
AMERICAN JOURNAL OF HUMAN GENETICS, 2003, 73 (04) :823-834
[3]   Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer [J].
Bashir, Ali ;
Volik, Stanislav ;
Collins, Colin ;
Bafna, Vineet ;
Raphael, Benjamin J. .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (04)
[4]   Death and Resurrection of the Human IRGM Gene [J].
Bekpen, Cemalettin ;
Marques-Bonet, Tomas ;
Alkan, Can ;
Antonacci, Francesca ;
Leogrande, Maria Bruna ;
Ventura, Mario ;
Kidd, Jeffrey M. ;
Siswara, Priscillia ;
Howard, Jonathan C. ;
Eichler, Evan E. .
PLOS GENETICS, 2009, 5 (03)
[5]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[6]  
Chen K, 2009, NAT METHODS, V6, P677, DOI [10.1038/NMETH.1363, 10.1038/nmeth.1363]
[7]   Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants [J].
Du, Jiang ;
Bjornson, Robert D. ;
Zhang, Zhengdong D. ;
Kong, Yong ;
Snyder, Michael ;
Gerstein, Mark B. .
PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (07)
[8]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[9]   Structural variation in the human genome [J].
Feuk, L ;
Carson, AR ;
Scherer, SW .
NATURE REVIEWS GENETICS, 2006, 7 (02) :85-97
[10]   EFFICIENT ALGORITHMS FOR INTERVAL-GRAPHS AND CIRCULAR-ARC GRAPHS [J].
GUPTA, UI ;
LEE, DT ;
LEUNG, JYT .
NETWORKS, 1982, 12 (04) :459-467