Inferring Population Mutation Rate and Sequencing Error Rate Using the SNP Frequency Spectrum in a Sample of DNA Sequences

被引:9
作者
Liu, Xiaoming [1 ]
Maxwell, Taylor J. [1 ]
Boerwinkle, Eric [1 ]
Fu, Yun-Xin [1 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Ctr Human Genet, Sch Publ Hlth, Houston, TX USA
基金
美国国家卫生研究院;
关键词
coalescent theory; sequencing error; mutation rate; SNP frequency spectrum; generalized least squares; SEGREGATING SITES; INDIVIDUALS; NEUTRALITY; MODELS; SIZE;
D O I
10.1093/molbev/msp059
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
One challenge of analyzing samples of DNA sequences is to account for the nonnegligible polymorphisms produced by error when the sequencing error rate is high or the sample size is large. Specifically, those artificial sequence variations will bias the observed single nucleotide polymorphism (SNP) frequency spectrum, which in turn may further bias the estimators of the population mutation rate theta mu for diploids. In this paper, we propose a new approach based on the generalized least squares (GLS) method to estimate theta, given a SNP frequency spectrum in a random sample of DNA sequences from a population. With this approach, error rate epsilon can be either known or unknown. In the latter case, epsilon can be estimated given an estimation of theta. Using coalescent simulation, we compared our estimators with other estimators of theta. The results showed that the GLS estimators are more efficient than other theta estimators with error, and the estimation of epsilon is usable in practice when the theta per bp is small. We demonstrate the application of the estimators with 10-kb noncoding region sequence sampled from a human population and provide suggestions for choosing theta estimators with error.
引用
收藏
页码:1479 / 1490
页数:12
相关论文
共 23 条
  • [1] Testing for neutrality in samples with sequencing errors
    Achat, Guillaume
    [J]. GENETICS, 2008, 179 (03) : 1409 - 1424
  • [2] A robust measure of HIV-1 population turnover within chronically infected individuals
    Achaz, G
    Palmer, S
    Kearney, M
    Maldarelli, F
    Mellors, JW
    Coffin, JM
    Wakeley, J
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2004, 21 (10) : 1902 - 1912
  • [3] CLARK AG, 1992, MOL BIOL EVOL, V9, P744
  • [4] The patterns of natural variation in human genes
    Crawford, DC
    Akey, DT
    Nickerson, DA
    [J]. ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, 2005, 6 : 287 - 312
  • [5] MUSCLE: multiple sequence alignment with high accuracy and high throughput
    Edgar, RC
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (05) : 1792 - 1797
  • [6] STATISTICAL PROPERTIES OF SEGREGATING SITES
    FU, YX
    [J]. THEORETICAL POPULATION BIOLOGY, 1995, 48 (02) : 172 - 197
  • [7] FU YX, 1994, GENETICS, V136, P685
  • [8] FU YX, 1993, GENETICS, V133, P693
  • [9] FU YX, 1994, GENETICS, V138, P1375
  • [10] Population genetic analysis of shotgun assemblies of genomic sequences from multiple individuals
    Hellmann, Ines
    Mang, Yuan
    Gu, Zhiping
    Li, Peter
    de la Vega, Francisco M.
    Clark, Andrew G.
    Nielsen, Rasmus
    [J]. GENOME RESEARCH, 2008, 18 (07) : 1020 - 1029