How well is enzyme function conserved as a function of pairwise sequence identity?

被引:301
作者
Tian, WD
Skolnick, J
机构
[1] SUNY Buffalo, Ctr Excellence, Buffalo, NY 14203 USA
[2] Washington Univ, Dept Biol, St Louis, MO 63130 USA
基金
美国国家卫生研究院;
关键词
genome annotation; conservation of protein function; enzyme classification; sequence comparisons; PSI-BLAST;
D O I
10.1016/j.jmb.2003.08.057
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function is well conserved. In contrast, Rost argued that because of database bias, the results from such simple pairwise comparisons might be misleading. Thus, by grouping enzyme sequences into families based on sequence similarity and selecting representative sequences for comparison, he showed that enzyme function starts to diverge quickly when the sequence identity is below 70%. Here, we employ a strategy similar to Rost's to reduce the database bias; however, we classify enzyme families based not only on sequence similarity, but also on functional similarity, i.e. sequences in each family must have the same four digits or the same first three digits of the enzyme commission (EC) number. Furthermore, instead of selecting representative sequences for comparison, we calculate the function conservation of each enzyme family and then average the degree of enzyme function conservation across all enzyme families. Our analysis suggests that for functional transferability, 40% sequence identity can still be used as a confident threshold to transfer the first three digits of an EC number; however, to transfer all four digits of an EC number, above 60% sequence identity is needed to have at least 90% accuracy. Moreover, when PSI-BLAST is used, the magnitude of the E-value is found to be weakly correlated with the extent of enzyme function conservation in the third iteration of PSI-BLAST. As a result, functional annotation based on the E-values from PSI-BLAST should be used with caution. We also show that by employing an enzyme family-specific sequence identity threshold above which 100% functional conservation is required, functional inference of unknown sequences can be accurately accomplished. However, this comes at a cost: those true positive sequences below this threshold cannot be uniquely identified. (C) 2003 Published by Elsevier Ltd.
引用
收藏
页码:863 / 882
页数:20
相关论文
共 57 条
  • [1] IMMOBILIZATION OF ASPERGILLUS-NIGER NRC-107 XYLANASE AND BETA-XYLOSIDASE, AND PROPERTIES OF THE IMMOBILIZED ENZYMES
    ABDELNABY, MA
    [J]. APPLIED BIOCHEMISTRY AND BIOTECHNOLOGY, 1993, 38 (1-2) : 69 - 81
  • [2] Altschul SF, 1996, METHOD ENZYMOL, V266, P460
  • [3] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [4] Bioinformatics: From genome data to biological knowledge
    Andrade, MA
    Sander, C
    [J]. CURRENT OPINION IN BIOTECHNOLOGY, 1997, 8 (06) : 675 - 683
  • [5] ConSurf: An algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information
    Armon, A
    Graur, D
    Ben-Tal, N
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2001, 307 (01) : 447 - 463
  • [6] PRINTS-S: the database formerly known as PRINTS
    Attwood, TK
    Croning, MDR
    Flower, DR
    Lewis, AP
    Mabey, JE
    Scordis, P
    Selley, JN
    Wright, W
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 225 - 227
  • [7] The ENZYME database in 2000
    Bairoch, A
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 304 - 305
  • [8] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [9] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [10] Predicting function: From genes to genomes and back
    Bork, P
    Dandekar, T
    Diaz-Lazcoz, Y
    Eisenhaber, F
    Huynen, M
    Yuan, YP
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1998, 283 (04) : 707 - 725