Estimation of Allele Frequencies From High-Coverage Genome-Sequencing Projects

被引:77
作者
Lynch, Michael [1 ]
机构
[1] Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
DNA; ACCURACY;
D O I
10.1534/genetics.109.100479
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
A new generation of high-throughput sequencing strategies will soon lead to the acquisition of high-coverage genomic profiles of hundreds to thousands of individuals within species, generating unprecedented levels of information on the frequencies of nucleotides segregating at individual sites. However, because these new technologies are error prone and yield uneven coverage of alleles in diploid individuals, they also introduce the need for novel methods for analyzing the raw read data. A maximum-likelihood method For the estimation of allele frequencies is developed, eliminating both the need to arbitrarily discard individuals with low coverage and the requirement. for an extrinsic measure of the sequence error rate. The resultant estimates are nearly unbiased with asymptotically minimal sampling variance, thereby defining the limits to our ability to estimate population-genetic parameters and providing a logical basis for the optimal design of population-genomic surveys.
引用
收藏
页码:295 / 301
页数:7
相关论文
共 20 条
[1]  
[Anonymous], 1998, Genetics and Analysis of Quantitative Traits (Sinauer)
[2]  
[Anonymous], 1972, Likelihood
[3]   Patterns of damage in genomic DNA sequences from a Neandertal [J].
Briggs, Adrian W. ;
Stenzel, Udo ;
Johnson, Philip L. F. ;
Green, Richard E. ;
Kelso, Janet ;
Pruefer, Kay ;
Meyer, Matthias ;
Krause, Johannes ;
Ronan, Michael T. ;
Lachmann, Michael ;
Paeaebo, Svante .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (37) :14616-14621
[4]  
Ewens W. J., 2004, MATH POPULATION GENE, DOI DOI 10.1007/978-0-387-21822-9
[5]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[6]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[7]   DNA from pre-Clovis human coprolites in Oregon, North America [J].
Gilbert, M. Thomas P. ;
Jenkins, Dennis L. ;
Gotherstrom, Anders ;
Naveran, Nuria ;
Sanchez, Juan J. ;
Hofreiter, Michael ;
Thomsen, Philip Francis ;
Binladen, Jonas ;
Higham, Thomas F. G. ;
Yohe, Robert M., II ;
Parr, Robert ;
Cummings, Linda Scott ;
Willerslev, Eske .
SCIENCE, 2008, 320 (5877) :786-789
[8]   Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals [J].
Hellmann, Ines ;
Mang, Yuan ;
Gu, Zhiping ;
Li, Peter ;
de la Vega, Francisco M. ;
Clark, Andrew G. ;
Nielsen, Rasmus .
GENOME RESEARCH, 2008, 18 (07) :1020-1029
[9]   Accuracy and quality of massively parallel DNA pyrosequencing [J].
Huse, Susan M. ;
Huber, Julie A. ;
Morrison, Hilary G. ;
Sogin, Mitchell L. ;
Mark Welch, David .
GENOME BIOLOGY, 2007, 8 (07)
[10]   Population Genetic Inference From Resequencing Data [J].
Jiang, Rong ;
Tavare, Simon ;
Marjoram, Paul .
GENETICS, 2009, 181 (01) :187-197