SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples

被引:108
作者
Le, Si Quang [1 ]
Durbin, Richard [1 ]
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
基金
英国惠康基金;
关键词
GENOME SEQUENCE;
D O I
10.1101/gr.113084.110
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Reductions in the cost of sequencing have enabled whole-genome sequencing to identify sequence variants segregating in a population. An efficient approach is to sequence many samples at low coverage, then to combine data across samples to detect shared variants. Here, we present methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information. For each population, we first collect SNP candidates based on independent sequence calls per site. We then use MARGARITA with genotype or phased haplotype data from the same samples to collect 20 ancestral recombination graphs (ARGs). We refine the posterior probability of SNP candidates by considering possible mutations at internal branches of the 40 marginal ancestral trees inferred from the 20 ARGs at the left and right flanking genotype sites. Using a population genetic prior distribution on tree-branch length and Bayesian inference, we determine a posterior probability of the SNP being real and also the most probable phased genotype call for each individual. We present experiments on both simulation data and real data from the 1000 Genomes Project to prove the applicability of the methods. We also explore the relative tradeoff between sequencing depth and the number of sequenced samples.
引用
收藏
页码:952 / 960
页数:9
相关论文
共 18 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Integrating common and rare genetic variation in diverse human populations [J].
Altshuler, David M. ;
Gibbs, Richard A. ;
Peltonen, Leena ;
Dermitzakis, Emmanouil ;
Schaffner, Stephen F. ;
Yu, Fuli ;
Bonnen, Penelope E. ;
de Bakker, Paul I. W. ;
Deloukas, Panos ;
Gabriel, Stacey B. ;
Gwilliam, Rhian ;
Hunt, Sarah ;
Inouye, Michael ;
Jia, Xiaoming ;
Palotie, Aarno ;
Parkin, Melissa ;
Whittaker, Pamela ;
Chang, Kyle ;
Hawes, Alicia ;
Lewis, Lora R. ;
Ren, Yanru ;
Wheeler, David ;
Muzny, Donna Marie ;
Barnes, Chris ;
Darvishi, Katayoon ;
Hurles, Matthew ;
Korn, Joshua M. ;
Kristiansson, Kati ;
Lee, Charles ;
McCarroll, Steven A. ;
Nemesh, James ;
Keinan, Alon ;
Montgomery, Stephen B. ;
Pollack, Samuela ;
Price, Alkes L. ;
Soranzo, Nicole ;
Gonzaga-Jauregui, Claudia ;
Anttila, Verneri ;
Brodeur, Wendy ;
Daly, Mark J. ;
Leslie, Stephen ;
McVean, Gil ;
Moutsianas, Loukas ;
Nguyen, Huy ;
Zhang, Qingrun ;
Ghori, Mohammed J. R. ;
McGinnis, Ralph ;
McLaren, William ;
Takeuchi, Fumihiko ;
Grossman, Sharon R. .
NATURE, 2010, 467 (7311) :52-58
[3]   Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies [J].
Browning, Brian L. ;
Yu, Zhaoxia .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 85 (06) :847-861
[4]   Fast and flexible simulation of DNA sequence data [J].
Chen, Gary K. ;
Marjoram, Paul ;
Wall, Jeffrey D. .
GENOME RESEARCH, 2009, 19 (01) :136-142
[5]   A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies [J].
Howie, Bryan N. ;
Donnelly, Peter ;
Marchini, Jonathan .
PLOS GENETICS, 2009, 5 (06)
[6]   A highly annotated whole-genome sequence of a Korean individual [J].
Kim, Jong-Il ;
Ju, Young Seok ;
Park, Hansoo ;
Kim, Sheehyun ;
Lee, Seonwook ;
Yi, Jae-Hyuk ;
Mudge, Joann ;
Miller, Neil A. ;
Hong, Dongwan ;
Bell, Callum J. ;
Kim, Hye-Sun ;
Chung, In-Soon ;
Lee, Woo-Chung ;
Lee, Ji-Sun ;
Seo, Seung-Hyun ;
Yun, Ji-Young ;
Woo, Hyun Nyun ;
Lee, Heewook ;
Suh, Dongwhan ;
Lee, Seungbok ;
Kim, Hyun-Jin ;
Yavartanoo, Maryam ;
Kwak, Minhye ;
Zheng, Ying ;
Lee, Mi Kyeong ;
Park, Hyunjun ;
Kim, Jeong Yeon ;
Gokcumen, Omer ;
Mills, Ryan E. ;
Zaranek, Alexander Wait ;
Thakuria, Joseph ;
Wu, Xiaodi ;
Kim, Ryan W. ;
Huntley, Jim J. ;
Luo, Shujun ;
Schroth, Gary P. ;
Wu, Thomas D. ;
Kim, HyeRan ;
Yang, Kap-Seok ;
Park, Woong-Yang ;
Kim, Hyungtae ;
Church, George M. ;
Lee, Charles ;
Kingsmore, Stephen F. ;
Seo, Jeong-Sun .
NATURE, 2009, 460 (7258) :1011-U96
[7]   The diploid genome sequence of an individual human [J].
Levy, Samuel ;
Sutton, Granger ;
Ng, Pauline C. ;
Feuk, Lars ;
Halpern, Aaron L. ;
Walenz, Brian P. ;
Axelrod, Nelson ;
Huang, Jiaqi ;
Kirkness, Ewen F. ;
Denisov, Gennady ;
Lin, Yuan ;
MacDonald, Jeffrey R. ;
Pang, Andy Wing Chun ;
Shago, Mary ;
Stockwell, Timothy B. ;
Tsiamouri, Alexia ;
Bafna, Vineet ;
Bansal, Vikas ;
Kravitz, Saul A. ;
Busam, Dana A. ;
Beeson, Karen Y. ;
Mclntosh, Tina C. ;
Remington, Karin A. ;
Abril, Josep F. ;
Gill, John ;
Borman, Jon ;
Rogers, Yu-Hui ;
Frazier, Marvin E. ;
Scherer, Stephen W. ;
Strausberg, Robert L. ;
Venter, J. Craig .
PLOS BIOLOGY, 2007, 5 (10) :2113-2144
[8]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[9]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[10]   SNP detection for massively parallel whole-genome resequencing [J].
Li, Ruiqiang ;
Li, Yingrui ;
Fang, Xiaodong ;
Yang, Huanming ;
Wang, Jian ;
Kristiansen, Karsten ;
Wang, Jun .
GENOME RESEARCH, 2009, 19 (06) :1124-1132