Computational assignment of the EC numbers for genomic-scale analysis of enzymatic reactions

被引:101
作者
Kotera, M [1 ]
Okuno, Y [1 ]
Hattori, M [1 ]
Goto, S [1 ]
Kanehisa, M [1 ]
机构
[1] Kyoto Univ, Inst Chem Res, Bioinformat Ctr, Uji, Kyoto 6110011, Japan
关键词
D O I
10.1021/ja0466457
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The EC (Enzyme Commission) numbers represent a hierarchical classification of enzymatic reactions, but they are also commonly utilized as identifiers of enzymes or enzyme genes in the analysis of complete genomes. This duality of the EC numbers makes it possible to link the genomic repertoire of enzyme genes to the chemical repertoire of metabolic pathways, the process called metabolic reconstruction. Unfortunately, there are numerous reactions known to be present in various pathways, but they will never get EC numbers because the EC number assignment requires published articles on full characterization of enzymes. Here we report a computerized method to automatically assign the EC numbers up to the sub-subclasses, i.e., without the fourth serial number for substrate specificity, given pairs of substrates and products. The method is based on a new classification scheme of enzymatic reactions, named the RC (reaction classification) number. Each reaction in the current dataset of the EC numbers is first decomposed into reactant pairs. Each pair is then structurally aligned to identify the reaction center, the matched region, and the difference region. The RC number represents the conversion patterns of atom types in these three regions. We examined the correspondence between computationally assigned RC numbers and manually assigned EC numbers by the jackknife cross-validation test and found that the EC sub-subclasses could be assigned with the accuracy of about 90%. Furthermore, we examined the correlation with genomic information as represented by the KEGG ortholog clusters (OC) and confirmed that the RC numbers are correlated not only with elementary reaction mechanisms but also with protein families.
引用
收藏
页码:16487 / 16498
页数:12
相关论文
共 21 条
  • [1] [Anonymous], 1992, Enzyme Nomenclature
  • [2] In silico atomic tracing by substrate-product relationships in Escherichia coli intermediary metabolism
    Arita, M
    [J]. GENOME RESEARCH, 2003, 13 (11) : 2455 - 2466
  • [3] Definitions of enzyme function for the structural genomics era
    Babbitt, PC
    [J]. CURRENT OPINION IN CHEMICAL BIOLOGY, 2003, 7 (02) : 230 - 237
  • [4] Reconstruction of amino acid biosynthesis pathways from the complete genome sequence
    Bono, H
    Ogata, H
    Goto, S
    Kanehisa, M
    [J]. GENOME RESEARCH, 1998, 8 (03) : 203 - 210
  • [5] Enzyme family classification by support vector machines
    Cai, CZ
    Han, LY
    Ji, ZL
    Chen, YZ
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) : 66 - 76
  • [6] An evolving hierarchical family classification for glycosyltransferases
    Coutinho, PM
    Deleury, E
    Davies, GJ
    Henrissat, B
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2003, 328 (02) : 307 - 317
  • [7] Devos D, 2000, PROTEINS, V41, P98, DOI 10.1002/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO
  • [8] 2-S
  • [9] On the properties of bit string-based measures of chemical similarity
    Flower, DR
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (03): : 379 - 386
  • [10] PathFinder: reconstruction and dynamic visualization of metabolic pathways
    Goesmann, A
    Haubrock, M
    Meyer, F
    Kalinowski, J
    Giegerich, R
    [J]. BIOINFORMATICS, 2002, 18 (01) : 124 - 129