Predicting enzyme family classes by hybridizing gene product composition and pseudo-amino acid composition

被引:80
作者
Cai, YD
Zhou, GP
Chou, KC
机构
[1] Gordon Life Sci Inst, San Diego, CA 92130 USA
[2] Tianjin Normal Univ, Tianjin Inst Bioinformat & Drug Discovery, Tianjin 300074, Peoples R China
[3] Harvard Univ, Sch Med, Beth Israel Med Ctr, Boston, MA 02115 USA
[4] Univ Manchester, Inst Sci & Technol, Biomol Sci Dept, Manchester M60 1QD, Lancs, England
关键词
classification of enzyme commission; gene ontology; enzymatic attribute; quasi sequence-order effect; nearest neighbor predictor; bioinformatics; proteomics;
D O I
10.1016/j.jtbi.2004.11.017
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
A: new method has been developed to predict the enzymatic attribute of proteins by hybridizing the gene product composition and pseudo amino acid composition. As a demonstration, a working dataset was generated with a cutoff of 60% sequence identity to avoid redundancy and bias in statistical prediction. The dataset thus constructed contains 39 989 protein sequences, of which 27 469 are non-enzymes and 12 520 enzymes that were further classified into 6 enzyme family classes according to their 6 main EC (Enzyme Con mission) numbers (2314 are oxidoreductases, 3653 transferases, 3246 hydrolases, 1307 lyases, 676 isomerases,and 1324 ligases). The overall success rate by the jackknife test for the identification between enzyme and non-enzyme was 94%, and that for the identification among the 6 enzyme family classes was 98%. It is anticipated that, with the rapid increase of protein sequences entering into databanks, the current method will become a useful automated tool in identifying the enzymatic attribute of a newly found protein sequence. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:145 / 149
页数:5
相关论文
共 26 条
[1]  
ALBERTS B, 1994, MOL BIOL CELL, pCH1
[2]  
[Anonymous], 1992, ENZYME NOMENCLATURE
[3]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].
Bairoch, A ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36
[6]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305
[7]   A JOINT PREDICTION OF THE FOLDING TYPES OF 1490 HUMAN PROTEINS FROM THEIR GENETIC CODONS [J].
CHOU, JJW ;
ZHANG, CT .
JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) :251-262
[8]   Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J].
Chou, KC .
BIOINFORMATICS, 2005, 21 (01) :10-19
[9]   A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 311 (03) :743-747
[10]  
Chou KC, 1999, PROTEINS, V34, P137, DOI 10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO