A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data

被引:4392
作者
Li, Heng [1 ]
机构
[1] Broad Inst, Med Populat Genet Program, Cambridge Ctr 7, Cambridge, MA 02142 USA
基金
美国国家卫生研究院;
关键词
READ ALIGNMENT; GENOME; GENOTYPE; ACCURATE; HAPLOTYPES; FREQUENCY; DESIGN; FORMAT;
D O I
10.1093/bioinformatics/btr509
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Most existing methods for DNA sequence analysis rely on accurate sequences or genotypes. However, in applications of the next-generation sequencing (NGS), accurate genotypes may not be easily obtained (e. g. multi-sample low-coverage sequencing or somatic mutation discovery). These applications press for the development of new methods for analyzing sequence data with uncertainty. Results: We present a statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data without explicit genotyping or linkage-based imputation. On real data, we demonstrate that our method achieves comparable accuracy to alternative methods for estimating site allele count, for inferring allele frequency spectrum and for association mapping. We also highlight the necessity of using symmetric datasets for finding somatic mutations and confirm that for discovering rare events, mismapping is frequently the leading source of errors.
引用
收藏
页码:2987 / 2993
页数:7
相关论文
共 37 条
[1]   Accurate and comprehensive sequencing of personal genomes [J].
Ajay, Subramanian S. ;
Parker, Stephen C. J. ;
Abaan, Hatice Ozel ;
Fajardo, Karin V. Fuentes ;
Margulies, Elliott H. .
GENOME RESEARCH, 2011, 21 (09) :1498-1505
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]  
[Anonymous], 2002, Algorithms for Minimization Without Derivatives
[4]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[5]   Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies [J].
Browning, Brian L. ;
Yu, Zhaoxia .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) :847-861
[6]   Variation in genome-wide mutation rates within and between human families [J].
Conrad, Donald F. ;
Keebler, Jonathan E. M. ;
DePristo, Mark A. ;
Lindsay, Sarah J. ;
Zhang, Yujun ;
Casals, Ferran ;
Idaghdour, Youssef ;
Hartl, Chris L. ;
Torroja, Carlos ;
Garimella, Kiran V. ;
Zilversmit, Martine ;
Cartwright, Reed ;
Rouleau, Guy A. ;
Daly, Mark ;
Stone, Eric A. ;
Hurles, Matthew E. ;
Awadalla, Philip .
NATURE GENETICS, 2011, 43 (07) :712-U137
[7]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[8]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[9]   Human Genome Sequencing Using Unchained Base Reads on Self-Assembling DNA Nanoarrays [J].
Drmanac, Radoje ;
Sparks, Andrew B. ;
Callow, Matthew J. ;
Halpern, Aaron L. ;
Burns, Norman L. ;
Kermani, Bahram G. ;
Carnevali, Paolo ;
Nazarenko, Igor ;
Nilsen, Geoffrey B. ;
Yeung, George ;
Dahl, Fredrik ;
Fernandez, Andres ;
Staker, Bryan ;
Pant, Krishna P. ;
Baccash, Jonathan ;
Borcherding, Adam P. ;
Brownley, Anushka ;
Cedeno, Ryan ;
Chen, Linsu ;
Chernikoff, Dan ;
Cheung, Alex ;
Chirita, Razvan ;
Curson, Benjamin ;
Ebert, Jessica C. ;
Hacker, Coleen R. ;
Hartlage, Robert ;
Hauser, Brian ;
Huang, Steve ;
Jiang, Yuan ;
Karpinchyk, Vitali ;
Koenig, Mark ;
Kong, Calvin ;
Landers, Tom ;
Le, Catherine ;
Liu, Jia ;
McBride, Celeste E. ;
Morenzoni, Matt ;
Morey, Robert E. ;
Mutch, Karl ;
Perazich, Helena ;
Perry, Kimberly ;
Peters, Brock A. ;
Peterson, Joe ;
Pethiyagoda, Charit L. ;
Pothuraju, Kaliprasad ;
Richter, Claudia ;
Rosenbaum, Abraham M. ;
Roy, Shaunak ;
Shafto, Jay ;
Sharanhovich, Uladzislau .
SCIENCE, 2010, 327 (5961) :78-81
[10]  
Durbin R., 1998, Analysis, V356, DOI [10.1017/CBO9780511790492, DOI 10.1017/CBO9780511790492]