coalescent theory;
sequencing error;
mutation rate;
SNP frequency spectrum;
generalized least squares;
SEGREGATING SITES;
INDIVIDUALS;
NEUTRALITY;
MODELS;
SIZE;
D O I:
10.1093/molbev/msp059
中图分类号:
Q5 [生物化学];
Q7 [分子生物学];
学科分类号:
071010 ;
081704 ;
摘要:
One challenge of analyzing samples of DNA sequences is to account for the nonnegligible polymorphisms produced by error when the sequencing error rate is high or the sample size is large. Specifically, those artificial sequence variations will bias the observed single nucleotide polymorphism (SNP) frequency spectrum, which in turn may further bias the estimators of the population mutation rate theta mu for diploids. In this paper, we propose a new approach based on the generalized least squares (GLS) method to estimate theta, given a SNP frequency spectrum in a random sample of DNA sequences from a population. With this approach, error rate epsilon can be either known or unknown. In the latter case, epsilon can be estimated given an estimation of theta. Using coalescent simulation, we compared our estimators with other estimators of theta. The results showed that the GLS estimators are more efficient than other theta estimators with error, and the estimation of epsilon is usable in practice when the theta per bp is small. We demonstrate the application of the estimators with 10-kb noncoding region sequence sampled from a human population and provide suggestions for choosing theta estimators with error.
机构:
Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USAUniv So Calif, Dept Prevent Med, Keck Sch Med, Los Angeles, CA 90089 USA
Jiang, Rong
Tavare, Simon
论文数: 0引用数: 0
h-index: 0
机构:
Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
Univ Cambridge, Dept Oncol, Cambridge CB2 0RE, EnglandUniv So Calif, Dept Prevent Med, Keck Sch Med, Los Angeles, CA 90089 USA
Tavare, Simon
Marjoram, Paul
论文数: 0引用数: 0
h-index: 0
机构:
Univ So Calif, Dept Prevent Med, Keck Sch Med, Los Angeles, CA 90089 USAUniv So Calif, Dept Prevent Med, Keck Sch Med, Los Angeles, CA 90089 USA
机构:
Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USAUniv So Calif, Dept Prevent Med, Keck Sch Med, Los Angeles, CA 90089 USA
Jiang, Rong
Tavare, Simon
论文数: 0引用数: 0
h-index: 0
机构:
Univ So Calif, Mol & Computat Biol Program, Los Angeles, CA 90089 USA
Univ Cambridge, Dept Oncol, Cambridge CB2 0RE, EnglandUniv So Calif, Dept Prevent Med, Keck Sch Med, Los Angeles, CA 90089 USA
Tavare, Simon
Marjoram, Paul
论文数: 0引用数: 0
h-index: 0
机构:
Univ So Calif, Dept Prevent Med, Keck Sch Med, Los Angeles, CA 90089 USAUniv So Calif, Dept Prevent Med, Keck Sch Med, Los Angeles, CA 90089 USA