EMPIRICAL AND STRUCTURAL MODELS FOR INSERTIONS AND DELETIONS IN THE DIVERGENT EVOLUTION OF PROTEINS

被引:142
作者
BENNER, SA [1 ]
COHEN, MA [1 ]
GONNET, GH [1 ]
机构
[1] SWISS FED INST TECHNOL, INST SCI COMPUTAT, CH-8092 ZURICH, SWITZERLAND
关键词
PROTEIN STRUCTURE; EVOLUTION; INSERTIONS DELETIONS; PROTEIN STRUCTURE PREDICTION;
D O I
10.1006/jmbi.1993.1105
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The exhaustive matching of the protein sequence database makes possible a broadly based study of insertions and deletions (indels) during divergent evolution. In this study, the probability of a gap in an alignment of a pair of homologous protein sequences was found to increase with the evolutionary distance measured in PAM units (number of accepted point mutations per 100 amino acid residues). A relationship between the average number of amino acid residues between indels and evolutionary distance suggests that a unit 30 to 40 amino acid residues in length remains, on average, undisrupted by indels during divergent evolution. Further, the probability of a gap was found to be inversely proportional to gap length raised to the 1.7 power. This empirical law fits closely over the entire range of gap lengths examined. Gap length distribution is largely independent of evolutionary distance. These results rule out the widely used linear gap penalty as a satisfactory formula for scoring gaps when constructing alignments. Further, the observed gap length distribution can be explained bya simple model of selective pressures governing the acceptance of indels during divergent evolution. Finally, this model provides theoretical support for using indels as part of ’parsing algorithms’, important in the de novo prediction of the folded structure of proteins from the sequence data. © 1993 Academic Press, Inc.
引用
收藏
页码:1065 / 1082
页数:18
相关论文
共 50 条
[1]  
ALLEMANN RK, 1989, THESIS ETH
[2]   GAP COSTS FOR MULTIPLE SEQUENCE ALIGNMENT [J].
ALTSCHUL, SF .
JOURNAL OF THEORETICAL BIOLOGY, 1989, 138 (03) :297-309
[3]   AN EXTREME VALUE THEORY FOR SEQUENCE MATCHING [J].
ARRATIA, R ;
GORDON, L ;
WATERMAN, M .
ANNALS OF STATISTICS, 1986, 14 (03) :971-993
[4]   THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK [J].
BAIROCH, A ;
BOECKMANN, B .
NUCLEIC ACIDS RESEARCH, 1991, 19 :2247-2248
[6]   INTERPRETING THE BEHAVIOR OF ENZYMES PURPOSE OR PEDIGREE [J].
BENNER, S ;
ELLINGTON, AD .
CRC CRITICAL REVIEWS IN BIOCHEMISTRY, 1988, 23 (04) :369-426
[7]   PATTERNS OF DIVERGENCE IN HOMOLOGOUS PROTEINS AS INDICATORS OF SECONDARY AND TERTIARY STRUCTURE - A PREDICTION OF THE STRUCTURE OF THE CATALYTIC DOMAIN OF PROTEIN-KINASES [J].
BENNER, SA ;
GERLOFF, D .
ADVANCES IN ENZYME REGULATION, 1991, 31 :121-181
[8]  
BENNER SA, 1989, ADV ENZYME REGUL, V28, P219, DOI 10.1016/0065-2571(89)90073-3
[9]   MODERN METABOLISM AS A PALIMPSEST OF THE RNA WORLD [J].
BENNER, SA ;
ELLINGTON, AD ;
TAUER, A .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1989, 86 (18) :7054-7058
[10]  
BENNER SA, 1990, BIOORG CHEM FRONTIER, V1, P1