Protein evolution viewed through Escherichia coli protein sequences: Introducing the notion of a structural segment of homology, the module

被引:84
作者
Riley, M [1 ]
Labedan, B [1 ]
机构
[1] UNIV PARIS 11, CNRS URA 1354, INST GENET MICROBIOL, F-91405 ORSAY, FRANCE
关键词
E-coli; paralogous; multimodular proteins; evolution; protein families;
D O I
10.1006/jmbi.1997.1003
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Paralogous genes are genes which descend from a progenitor gene which has duplicated as an ancestral gene, each copy having diverged prior to speciation. With comprehensive information available on functions of Escherichia coli proteins, analysis of sequence-related E. coli paralogous proteins can give information on the early ancestors of families of proteins now residing in many contemporary organisms, such as the enzymes of metabolism, some kinds of transport mechanisms and some kinds of regulatory mechanisms. Ln the first step, we have confirmed that E. coli contains a very high proportion of paralogous proteins. Next, we have defined two main classes of paralogous proteins. One class is formed of proteins which contain a unique structural segment homologous to a single set of related proteins. The other class corresponds to proteins which contain more than one structural segment of homology, each segment homologous to unrelated sets of proteins. We define such an independent structural segment of homology as a module. This modular structure (mean length equivalent to 209 amino acids) corresponds often to entire proteins, but there are also proteins that appear to be assembled from two or three independent modules having independent origins. Most multimodular proteins appear to have been formed early in their history, a minority appear to be relatively recent fusions of independent modules. Examining 1404 independent structural segments of homology, composed of both modules and entire proteins, we found that the segments of homology fell into 352 sequence-related groups or families. The majority of these families (ranging from 2 to 62 members) are functionally homogeneous. This strongly suggests that the 1404 present-day modules and proteins derive from a minimal set of 352 ancestral modules, each one being already of the same size and having a function similar to all members of its progeny. (C) 1997 Academic Press Limited.
引用
收藏
页码:857 / 868
页数:12
相关论文
共 34 条
[11]   DISTINGUISHING HOMOLOGOUS FROM ANALOGOUS PROTEINS [J].
FITCH, WM .
SYSTEMATIC ZOOLOGY, 1970, 19 (02) :99-&
[12]   WHOLE-GENOME RANDOM SEQUENCING AND ASSEMBLY OF HAEMOPHILUS-INFLUENZAE RD [J].
FLEISCHMANN, RD ;
ADAMS, MD ;
WHITE, O ;
CLAYTON, RA ;
KIRKNESS, EF ;
KERLAVAGE, AR ;
BULT, CJ ;
TOMB, JF ;
DOUGHERTY, BA ;
MERRICK, JM ;
MCKENNEY, K ;
SUTTON, G ;
FITZHUGH, W ;
FIELDS, C ;
GOCAYNE, JD ;
SCOTT, J ;
SHIRLEY, R ;
LIU, LI ;
GLODEK, A ;
KELLEY, JM ;
WEIDMAN, JF ;
PHILLIPS, CA ;
SPRIGGS, T ;
HEDBLOM, E ;
COTTON, MD ;
UTTERBACK, TR ;
HANNA, MC ;
NGUYEN, DT ;
SAUDEK, DM ;
BRANDON, RC ;
FINE, LD ;
FRITCHMAN, JL ;
FUHRMANN, JL ;
GEOGHAGEN, NSM ;
GNEHM, CL ;
MCDONALD, LA ;
SMALL, KV ;
FRASER, CM ;
SMITH, HO ;
VENTER, JC .
SCIENCE, 1995, 269 (5223) :496-512
[13]  
GOFFEAU A, 1996, IN PRESS YEAST, V12
[14]   EXHAUSTIVE MATCHING OF THE ENTIRE PROTEIN-SEQUENCE DATABASE [J].
GONNET, GH ;
COHEN, MA ;
BENNER, SA .
SCIENCE, 1992, 256 (5062) :1443-1445
[15]   Mapping the protein universe [J].
Holm, L ;
Sander, C .
SCIENCE, 1996, 273 (5275) :595-602
[16]   ENZYME RECRUITMENT IN EVOLUTION OF NEW FUNCTION [J].
JENSEN, RA .
ANNUAL REVIEW OF MICROBIOLOGY, 1976, 30 :409-425
[17]   A STRUCTURAL BASIS FOR SEQUENCE COMPARISONS - AN EVALUATION OF SCORING METHODOLOGIES [J].
JOHNSON, MS ;
OVERINGTON, JP .
JOURNAL OF MOLECULAR BIOLOGY, 1993, 233 (04) :716-738
[18]   SEQUENCE SIMILARITY ANALYSIS OF ESCHERICHIA-COLI PROTEINS - FUNCTIONAL AND EVOLUTIONARY IMPLICATIONS [J].
KOONIN, EV ;
TATUSOV, RL ;
RUDD, KE .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (25) :11921-11925
[19]  
KOONIN EV, 1996, ESCHERICHIA COLI SAL, P2203
[20]  
LABEDAN B, 1995, MOL BIOL EVOL, V12, P980