Rate matrices for analyzing large families of protein sequences

被引:15
作者
Devauchelle, C
Grossmann, A
Hénaut, A
Holschneider, M
Monnerot, M
Risler, JL
Torrésani, B
机构
[1] Lab Genome & Informat, F-91034 Evry, France
[2] CNRS Marseille Luminy, Ctr Phys Theor, F-13288 Marseille, France
[3] Univ Rennes 1, Rennes, France
[4] CNRS, Ctr Genet Mol, Gif Sur Yvette, France
[5] Univ Aix Marseille 1, CMI, Lab Anal Topol & Probabil, Marseille, France
关键词
protein evolution; rate matrices; LogDet distances; mitochondrial evolution;
D O I
10.1089/106652701752236205
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We propose and study a new approach for the analysis of families of protein sequences. This method is related to the LogDet distances used in phylogenetic reconstructions; it can be viewed as an attempt to embed these distances into a multidimensional framework. The proposed method starts by associating a Markov matrix to each pairwise alignment deduced from a given multiple alignment. The central objects under consideration here are matrix-valued logarithms L of these Markov matrices, which exist under conditions that are compatible with fairly large divergence between the sequences. These logarithms allow us to compare data from a family of aligned proteins with simple models (in particular, continuous reversible Markov models) and to test the adequacy of such models. If one neglects fluctuations arising from the finite length of sequences, any continuous reversible Markov model with a single rate matrix Q over an arbitrary tree predicts that all the observed matrices L are multiples of Q. Our method exploits this fact, without relying on any tree estimation. We test this prediction on a family of proteins encoded by the mitochondrial genome of 26 multicellular animals, which include vertebrates, arthropods, echinoderms, molluscs, and nematodes. A principal component analysis of the observed matrices L shows that a single rate model can be used as a rough approximation to the data, but that systematic deviations from any such model are unmistakable and related to the evolutionary history of the species under consideration.
引用
收藏
页码:381 / 399
页数:19
相关论文
共 19 条
[11]   A MUTATION DATA MATRIX FOR TRANSMEMBRANE PROTEINS [J].
JONES, DT ;
TAYLOR, WR ;
THORNTON, JM .
FEBS LETTERS, 1994, 339 (03) :269-275
[12]   RECONSTRUCTING EVOLUTIONARY TREES FROM DNA AND PROTEIN SEQUENCES - PARALINEAR DISTANCES [J].
LAKE, JA .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (04) :1455-1459
[13]  
LEE T, 1970, CONTRIBUTION EC ANAL
[14]  
LOCKHART PJ, 1994, MOL BIOL EVOL, V11, P605
[15]   Modeling amino acid replacement [J].
Müller, T ;
Vingron, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (06) :761-776
[16]   Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny [J].
Russo, CAM ;
Takezaki, N ;
Nei, M .
MOLECULAR BIOLOGY AND EVOLUTION, 1996, 13 (03) :525-536
[17]  
STEEL MA, 1995, P PHYL WORKSH PRINC, P51
[18]  
Tavare S., 1986, LECT MATH LIFE SCI, V17, P57, DOI DOI 10.1016/J.MARPOLBUL.2009.11.011
[19]   CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE [J].
THOMPSON, JD ;
HIGGINS, DG ;
GIBSON, TJ .
NUCLEIC ACIDS RESEARCH, 1994, 22 (22) :4673-4680