Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques

被引:80
作者
Duitama, Jorge [1 ,2 ,3 ]
McEwen, Gayle K. [1 ]
Huebsch, Thomas [1 ]
Palczewski, Stefanie [1 ]
Schulz, Sabrina [1 ]
Verstrepen, Kevin [2 ,3 ]
Suk, Eun-Kyung [1 ]
Hoehe, Margret R. [1 ]
机构
[1] Max Planck Inst Mol Genet, Dept Vertebrate Genom, D-14195 Berlin, Germany
[2] Katholieke Univ Leuven, VIB Lab Syst Biol, Ctr Microbial & Plant Genet, B-3001 Heverlee, Belgium
[3] Katholieke Univ Leuven, Lab Genet & Genom, Ctr Microbial & Plant Genet, B-3001 Heverlee, Belgium
基金
欧洲研究理事会;
关键词
ALGORITHMS; SEQUENCE;
D O I
10.1093/nar/gkr1042
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics.
引用
收藏
页码:2041 / 2053
页数:13
相关论文
共 37 条
[1]   A method and server for predicting damaging missense mutations [J].
Adzhubei, Ivan A. ;
Schmidt, Steffen ;
Peshkin, Leonid ;
Ramensky, Vasily E. ;
Gerasimova, Anna ;
Bork, Peer ;
Kondrashov, Alexey S. ;
Sunyaev, Shamil R. .
NATURE METHODS, 2010, 7 (04) :248-249
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]   HapCUT: an efficient and accurate algorithm for the haplotype assembly problem [J].
Bansal, Vikas ;
Bafna, Vineet .
BIOINFORMATICS, 2008, 24 (16) :I153-I159
[4]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[5]   2SNP: Scalable phasing method for trios and unrelated individuals [J].
Brinza, Dumitru ;
Zelikovsky, Alexander .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2008, 5 (02) :313-318
[6]   Clone-based systematic haplotyping (CSH): A procedure for physical haplotyping of whole genomes [J].
Burgtorf, C ;
Kepper, P ;
Hoehe, M ;
Schmitt, C ;
Reinhardt, R ;
Lehrach, H ;
Sauer, S .
GENOME RESEARCH, 2003, 13 (12) :2717-2724
[7]   Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments [J].
Chen, Zhixiang ;
Fu, Bin ;
Schweller, Robert ;
Yang, Boting ;
Zhao, Zhiyu ;
Zhu, Binhai .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2008, 15 (05) :535-546
[8]   The complexity of the single individual SNP haplotyping problem [J].
Cilibrasi, Rudi ;
van Iersel, Leo ;
Kelk, Steven ;
Tromp, John .
ALGORITHMICA, 2007, 49 (01) :13-36
[9]   Complex promoter and coding region β2-adrenergic receptor haplotypes alter receptor expression and predict in vivo responsiveness [J].
Drysdale, CM ;
McGraw, DW ;
Stack, CB ;
Stephens, JC ;
Judson, RS ;
Nandabalan, K ;
Arnold, K ;
Ruano, G ;
Liggett, SB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (19) :10483-10488
[10]  
Duitama J., 2011, 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), P87, DOI 10.1109/ICCABS.2011.5729949