Enzyme function less conserved than anticipated

被引:269
作者
Rost, B
机构
[1] Columbia Univ, CUBIC, Dept Biochem & Mol Biophys, New York, NY 10032 USA
[2] Columbia Univ, Ctr Computat Biol & Bioinformat C2B2, New York, NY 10032 USA
关键词
genome annotation; conservation of protein function; enzyme classification; evolution; statistical significance; bootstrap; bioinformatics;
D O I
10.1016/S0022-2836(02)00016-5
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The level of sequence similarity that implies similarity in protein structure is well established. Recently, many groups proposed thresholds for similarity in sequence implying similarity in enzymatic function. All previous results suggest the strong conservation of enzymatic function above levels of 50% pairwise sequence identity. Here, I argue that all groups substantially overestimated the conservation of enzyme function because their data sets were either too biased, or too small. An unbiased analysis suggested that less than 30% of the pair fragments above 50% sequence identity have entirely identical EC numbers. Another surprising finding was that even BLAST E-values below 10(-50) did not suffice to automatically transfer enzyme function without errors. As expected, most misclassifications originated from similarities in relatively short regions and/or from transferring annotations for different domains. Both problems cannot be corrected easily by adjusting the thresholds for automatic transfer of genome annotations. A score relating sequence identity to alignment length (distance from HSSP-threshold) outperformed statistical BLAST scores for high sequence similarity. In particular, the distance score allowed error-free transfer of enzyme function for the 10% most similar enzyme pairs. The results illustrated how difficult it is to assess the conservation of protein function and to guarantee error-free genome annotations, in general: sets with millions of pair comparisons might not suffice to arrive at statistically significant conclusions. In practice, the revised detailed estimates for the sequence conservation of enzyme function may provide important benchmarks for everyday sequence analysis and for more cautious automatic genome annotations. (C) 2002 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:595 / 608
页数:14
相关论文
共 58 条
  • [1] Do aligned sequences share the same fold?
    Abagyan, RA
    Batalov, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 273 (01) : 355 - 368
  • [2] ALEXANDROV NN, 1998, HICCS 98 PAC S BIOC, P463
  • [3] Altschul SF, 1996, METHOD ENZYMOL, V266, P460
  • [4] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [5] Bioinformatics: From genome data to biological knowledge
    Andrade, MA
    Sander, C
    [J]. CURRENT OPINION IN BIOTECHNOLOGY, 1997, 8 (06) : 675 - 683
  • [6] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [7] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [8] FROM GENOME SEQUENCES TO PROTEIN FUNCTION
    BORK, P
    OUZOUNIS, C
    SANDER, C
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1994, 4 (03) : 393 - 403
  • [9] Errors in genome annotation
    Brenner, SE
    [J]. TRENDS IN GENETICS, 1999, 15 (04) : 132 - 133
  • [10] Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
    Brenner, SE
    Chothia, C
    Hubbard, TJP
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) : 6073 - 6078