Combining multiple data sets in a likelihood analysis: Which models are the best?

被引:86
作者
Pupko, T
Huchon, D
Cao, Y
Okada, N
Hasegawa, M
机构
[1] Inst Stat Math, Minato Ku, Tokyo 1068569, Japan
[2] Tokyo Inst Technol, Fac Biosci & Biotechnol, Mol Evolut Lab, Tokyo 152, Japan
关键词
combining data sets; phylogeny; maximum likelihood; Mammalia; molecular evolution;
D O I
10.1093/oxfordjournals.molbev.a004053
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Until recently, phyloggenetic analyses have been routinely based on homologous sequences of a single gene, Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combine multiple molecular data sets. Here, we compare several models for combining different genes for the purpose of evaluating the likelihood of tree topologies. Three methods of branch length estimation were studied: assuming all genes have the same branch lengths (concatenate model), assuming that branch lengths are proportional among genes (proportional model), or assuming that each gene has a separate set of branch lengths (separate model). We also compared three models of among-site rate variation: the homogenous model, a model that assumes one gamma parameter for all genes, and a model that assumes one gamma parameter for each gene. On the basis of two nuclear and one mitochondrial amino acid data sets, our results suggest that, depending on the data set chosen, either the separate model or the proportional model represents the most appropriate method for branch length analysis. For all the data sets examined, one gamma parameter for each gene represents the best model for among-site rate variation, Using these models we analyzed alternative mammalian tree topologies, and we describe the effect of the assumed model on the maximum likelihood tree. We show that the choice of the model has an impact on the best phylogeny obtained.
引用
收藏
页码:2294 / 2307
页数:14
相关论文
共 34 条
  • [1] ADACHI J, 1996, COMPUT SCI MONOGR, V28, P1
  • [2] The mitochondrial genome of the sperm whale and a new molecular reference for estimating eutherian divergence dates
    Arnason, U
    Gullberg, A
    Gretarsdottir, S
    Ursing, B
    Janke, A
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 2000, 50 (06) : 569 - 578
  • [3] Burnham K. P., 1998, MODEL SELECTION INFE
  • [4] Conflict among individual mitochondrial proteins in resolving the phylogeny of eutherian orders
    Cao, Y
    Janke, A
    Waddell, PJ
    Westerman, M
    Takenaka, O
    Murata, S
    Okada, N
    Pääbo, S
    Hasegawa, M
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1998, 47 (03) : 307 - 322
  • [5] Interordinal relationships and timescale of eutherian evolution as inferred from mitochondrial genome data
    Cao, Y
    Fujiwara, M
    Nikaido, M
    Okada, N
    Hasegawa, M
    [J]. GENE, 2000, 259 (1-2) : 149 - 158
  • [6] Mitochondrial genes and mammalian phylogenies: Increasing the reliability of branch length estimation
    Corneli, PS
    Ward, RH
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2000, 17 (02) : 224 - 234
  • [7] EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH
    FELSENSTEIN, J
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) : 368 - 376
  • [8] FRIEDMAN N, 2001, P 5 ANN INT C COMP B, P132
  • [9] Graur D., 1999, FUNDAMENTALS MOL EVO
  • [10] A likelihood ratio test to detect conflicting phylogenetic signal
    Huelsenbeck, JP
    Bull, JJ
    [J]. SYSTEMATIC BIOLOGY, 1996, 45 (01) : 92 - 98