A Dirichlet process model for detecting positive selection in protein-coding DNA sequences

被引:62
作者
Huelsenbeck, JP [1 ]
Jain, S
Frost, SWD
Pond, SLK
机构
[1] Univ Calif San Diego, Div Biol Sci, Sect Ecol Behav & Evolut, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Dept Family & Prevent Med, Div Biostat & Bioinformat, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Antiviral Res Ctr, San Diego, CA 92103 USA
关键词
D O I
10.1073/pnas.0508279103
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Most methods for detecting Darwinian natural selection at the molecular level rely on estimating the rates or numbers of non-synonymous and synonymous changes in an alignment of protein-coding DNA sequences. In some of these methods, the nonsynonymous rate of substitution is allowed to vary across the sequence, permitting the identification of single amino acid positions that are under positive natural selection. However, it is unclear which probability distribution should be used to describe how the nonsynonymous rate of substitution varies across the sequence. One widely used solution is to model variation in the nonsynonymous rate across the sequence as a mixture of several discrete or continuous probability distributions. Unfortunately, there is little population genetics theory to inform us of the appropriate probability distribution for among-site variation in the nonsynonymous rate of substitution. Here, we describe an approach to modeling variation in the nonsynonymous rate of substitution by using a Dirichlet process mixture model. The Dirichlet process allows there to be a countably infinite number of nonsynonymous rate classes and is very flexible in accommodating different potential distributions for the nonsynonymous rate of substitution. We implemented the model in a fully Bayesian approach, with all parameters of the model considered as random variables.
引用
收藏
页码:6263 / 6268
页数:6
相关论文
共 38 条
[1]  
Abramowitz M., 1972, HDB MATH FUNCTIONS F
[2]   MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS [J].
ANTONIAK, CE .
ANNALS OF STATISTICS, 1974, 2 (06) :1152-1174
[3]  
Comeron JM, 1995, J MOL EVOL, V41, P1152, DOI 10.1007/BF00173196
[4]   Parallel evolution of drug resistance in HIV: Failure of nonsynonymous/synonymous substitution rate ratio to detect selection [J].
Crandall, KA ;
Kelsey, CR ;
Imamichi, H ;
Lane, HC ;
Salzman, NP .
MOLECULAR BIOLOGY AND EVOLUTION, 1999, 16 (03) :372-382
[5]   EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH [J].
FELSENSTEIN, J .
JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) :368-376
[6]   BAYESIAN ANALYSIS OF SOME NONPARAMETRIC PROBLEMS [J].
FERGUSON, TS .
ANNALS OF STATISTICS, 1973, 1 (02) :209-230
[7]   Long term trends in the evolution of H(3) HA1 human influenza type A [J].
Fitch, WM ;
Bush, RM ;
Bender, CA ;
Cox, NJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1997, 94 (15) :7712-7718
[8]  
HASTINGS WK, 1970, BIOMETRIKA, V57, P97, DOI 10.1093/biomet/57.1.97
[9]   Bayesian estimation of positively selected sites [J].
Huelsenbeck, JP ;
Dyer, KA .
JOURNAL OF MOLECULAR EVOLUTION, 2004, 58 (06) :661-672
[10]   PATTERN OF NUCLEOTIDE SUBSTITUTION AT MAJOR HISTOCOMPATIBILITY COMPLEX CLASS-I LOCI REVEALS OVERDOMINANT SELECTION [J].
HUGHES, AL ;
NEI, M .
NATURE, 1988, 335 (6186) :167-170