TESTS OF APPLICABILITY OF SEVERAL SUBSTITUTION MODELS FOR DNA-SEQUENCE DATA

被引:112
作者
RZHETSKY, A [1 ]
NEI, M [1 ]
机构
[1] PENN STATE UNIV,DEPT BIOL,UNIVERSITY PK,PA 16802
关键词
LINEAR INVARIANTS; TEST STATISTICS; STATIONARITY OF BASE COMPOSITION; NUCLEOTIDE SUBSTITUTION MODELS;
D O I
10.1093/oxfordjournals.molbev.a040182
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Using linear invariants for various models of nucleotide substitution, we developed test statistics for examining the applicability of a specific model to a given dataset in phylogenetic inference. The models examined are those developed by Jukes and Cantor (1969), Kimura (1980), Tajima and Nei (1984), Hasegawa et al. (1985), Tamura (1992), Tamura and Nei (1993), and a new model called the eight-parameter model. The first six models are special cases of the last model. The test statistics developed are independent of evolutionary time and phylogeny, although the variances of the statistics contain phylogenetic information. Therefore, these statistics can be used before a phylogenetic tree is estimated. Our objective is to find the simplest model that is applicable to a given dataset, keeping in mind that a simple model usually gives an estimate of evolutionary distance (number of nucleotide substitutions per site) with a smaller variance than a complicated model when the simple model is correct. We have also developed a statistical test of the homogeneity of nucleotide frequencies of a sample of several sequences that takes into account possible phylogenetic correlations. This test is used to examine the stationarity in time of the base frequencies in the sample. For Hasegawa et al.'s and the eight-parameter models, analytical formulas for estimating evolutionary distances are presented. Application of the above tests to several sets of real data has shown that the assumption of stationarity of base composition is usually acceptable when the sequences studied are closely related but otherwise it is rejected. Similarly, the simple models of nucleotide substitution are almost always rejected when actual genes are distantly related and/or the total number of nucleotides examined is large.
引用
收藏
页码:131 / 151
页数:21
相关论文
共 24 条