Predicting enzyme subclass by functional domain composition and pseudo amino acid composition

被引:78
作者
Cai, YD
Chou, KC [1 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Biol Sci, Bioinformat Ctr, Shanghai 200031, Peoples R China
[2] Shanghai Ctr Bioinformat Technol, Shanghai 200235, Peoples R China
[3] Univ Manchester, Inst Sci & Technol, Biomol Sci Dept, Manchester M60 1QD, Lancs, England
[4] Gordon Life Sci Inst, San Diego, CA 92130 USA
基金
中国国家自然科学基金;
关键词
ENZYME database; 40% cutoff; functional domain; pseudo amino acid composition; ISort predictor; FunD-PseAA predictor; bioinformatics; proteomics;
D O I
10.1021/pr0500399
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
As a continuous effort to use the sequence approach to identify enzymatic function at a deeper level, investigations are extended from the main enzyme classes (Protein Sci. 2004, 13, 2857-2863) to their subclasses. This is indispensable if we wish to understand the molecular mechanism of an enzyme at a deeper level. For each of the 6 main enzyme classes (i.e., oxidoreductase, transferase, hydrolase, lyase, isomerase, and ligase), a subclass training dataset is constructed. To reduce homologous bias, a stringent cutoff was imposed that all the entries included in the datasets have less than 40% sequence identity to each other. To catch the core feature that is intimately related to the biological function, the sample of a protein is represented by hybridizing the functional domain composition and pseudo amino acid composition. On the basis of such a hybridization representation, the FunD-PseAA predictor is established. It is demonstrated by the jackknife cross-validation tests that the overall success rate in identifying the 21 subclasses of oxidoreductases is above 86%, and the corresponding rates in identifying the subclasses of the other 5 main enzyme classes are 94-97%. The high success rates imply that the FunD-PseAA predictor may become a useful tool in bioinformatics and proteomics of the post-genomic era.
引用
收藏
页码:967 / 971
页数:5
相关论文
共 30 条
[1]  
[Anonymous], 1992, ENZYME NOMENCLATURE
[2]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[3]   Enzyme family classification by support vector machines [J].
Cai, CZ ;
Han, LY ;
Ji, ZL ;
Chen, YZ .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (01) :66-76
[4]   Using functional domain composition to predict enzyme family classes [J].
Cai, YD ;
Chou, KC .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (01) :109-111
[5]   A JOINT PREDICTION OF THE FOLDING TYPES OF 1490 HUMAN PROTEINS FROM THEIR GENETIC CODONS [J].
CHOU, JJW ;
ZHANG, CT .
JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) :251-262
[6]   Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J].
Chou, KC .
BIOINFORMATICS, 2005, 21 (01) :10-19
[7]   Predicting enzyme family class in a hybridization space [J].
Chou, KC ;
Cai, YD .
PROTEIN SCIENCE, 2004, 13 (11) :2857-2863
[8]   Predicting protein structural class by functional domain composition [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 321 (04) :1007-1009
[9]   Prediction of enzyme family classes [J].
Chou, KC ;
Elrod, DW .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :183-190
[10]   PREDICTION OF PROTEIN STRUCTURAL CLASSES [J].
CHOU, KC ;
ZHANG, CT .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) :275-349