SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data

被引:279
作者
Nielsen, Rasmus [1 ,2 ,3 ,4 ]
Korneliussen, Thorfinn [4 ]
Albrechtsen, Anders [4 ]
Li, Yingrui [1 ]
Wang, Jun [1 ,4 ]
机构
[1] BGI Shenzhen, Shenzhen, Peoples R China
[2] Univ Calif Berkeley, Dept Integrat Biol, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[4] Univ Copenhagen, Dept Biol, Copenhagen, Denmark
来源
PLOS ONE | 2012年 / 7卷 / 07期
基金
美国国家卫生研究院;
关键词
ASSOCIATION; INFERENCE; IMPUTATION; SITES;
D O I
10.1371/journal.pone.0037558
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.
引用
收藏
页数:10
相关论文
共 35 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering [J].
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) :1084-1097
[3]   AN INTRODUCTION TO EMPIRICAL BAYES DATA-ANALYSIS [J].
CASELLA, G .
AMERICAN STATISTICIAN, 1985, 39 (02) :83-87
[4]   Imputation methods to improve inference in SNP association studies [J].
Dai, James Y. ;
Ruczinski, Ingo ;
LeBlanc, Michael ;
Kooperberg, Charles .
GENETIC EPIDEMIOLOGY, 2006, 30 (08) :690-702
[5]   Benchmarking Next-Generation Transcriptome Sequencing for Functional and Evolutionary Genomics [J].
Gibbons, John G. ;
Janson, Eric M. ;
Hittinger, Chris Todd ;
Johnston, Mark ;
Abbot, Patrick ;
Rokas, Antonis .
MOLECULAR BIOLOGY AND EVOLUTION, 2009, 26 (12) :2731-2744
[6]   Demographic history and rare allele sharing among human populations [J].
Gravel, Simon ;
Henn, Brenna M. ;
Gutenkunst, Ryan N. ;
Indap, Amit R. ;
Marth, Gabor T. ;
Clark, Andrew G. ;
Yu, Fuli ;
Gibbs, Richard A. ;
Bustamante, Carlos D. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (29) :11983-11988
[7]   Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data [J].
Gutenkunst, Ryan N. ;
Hernandez, Ryan D. ;
Williamson, Scott H. ;
Bustamante, Carlos D. .
PLOS GENETICS, 2009, 5 (10)
[8]   Evaluation of next generation sequencing platforms for population targeted sequencing studies [J].
Harismendy, Olivier ;
Ng, Pauline C. ;
Strausberg, Robert L. ;
Wang, Xiaoyun ;
Stockwell, Timothy B. ;
Beeson, Karen Y. ;
Schork, Nicholas J. ;
Murray, Sarah S. ;
Topol, Eric J. ;
Levy, Samuel ;
Frazer, Kelly A. .
GENOME BIOLOGY, 2009, 10 (03)
[9]   Exome Sequencing of a Multigenerational Human Pedigree [J].
Hedges, Dale ;
Burges, Dan ;
Powell, Eric ;
Almonte, Cherylyn ;
Huang, Jia ;
Young, Stuart ;
Boese, Benjamin ;
Schmidt, Mike ;
Pericak-Vance, Margaret A. ;
Martin, Eden ;
Zhang, Xinmin ;
Harkins, Timothy T. ;
Zuechner, Stephan .
PLOS ONE, 2009, 4 (12)
[10]   Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals [J].
Hellmann, Ines ;
Mang, Yuan ;
Gu, Zhiping ;
Li, Peter ;
de la Vega, Francisco M. ;
Clark, Andrew G. ;
Nielsen, Rasmus .
GENOME RESEARCH, 2008, 18 (07) :1020-1029