Predicting enzyme family class in a hybridization space

被引:88
作者
Chou, KC [1 ]
Cai, YD
机构
[1] Gordon Life Sci Inst, San Diego, CA 92130 USA
[2] TIBDD, Tianjin, Peoples R China
[3] UMIST, Biomol Sci Dept, Manchester M60 1QD, Lancs, England
关键词
ENZYME database; 40% cutoff; Gene Ontology; pseudo-amino-acid composition; quasi-sequence-order effect; ISort predictor; GO-PseAA predictor; bioinformatics; proteomics;
D O I
10.1110/ps.04981104
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Given the sequence of a protein, how can we predict whether it is an enzyme or a non-enzyme? If it is, what enzyme family class it belongs to? Because these questions are closely relevant to the biological function of a protein and its acting object, their importance is self-evident. Particularly with the explosion of protein sequences entering into data banks and the relatively much slower progress in using biochemical experiments to determine their functions, it is highly desired to develop an automated method that can be used to give fast answers to these questions. By hybridizing the gene ontology and pseudo-amino-acid composition, we have introduced a new method that is called GO-PseAA predictor and operate it in a hybridization space. To avoid redundancy and bias, demonstrations were performed on a data set in which none of the proteins in an individual class has greater than or equal to40% sequence identity to any other. The overall success rate thus obtained by the jackknife cross-validation test in identifying enzyme and non-enzyme was 93%, and that in identifying the enzyme family was 94% for the following six main Enzyme Commission (EC) classes: (1) oxidoreductase, (2) transferase, (3) hydrolase, (4) lyase, (5) isomerase, and (6) ligase. The corresponding rates by the independent data set test were 98% and 97%, respectively.
引用
收藏
页码:2857 / 2863
页数:7
相关论文
共 29 条
[1]  
ALBERTS B, 1994, MOL BIOL CELL, pCH1
[2]  
[Anonymous], 1992, ENZYME NOMENCLATURE
[3]   The InterPro database, an integrated documentation resource for protein families, domains and functional sites [J].
Apweiler, R ;
Attwood, TK ;
Bairoch, A ;
Bateman, A ;
Birney, E ;
Biswas, M ;
Bucher, P ;
Cerutti, T ;
Corpet, F ;
Croning, MDR ;
Durbin, R ;
Falquet, L ;
Fleischmann, W ;
Gouzy, J ;
Hermjakob, H ;
Hulo, N ;
Jonassen, I ;
Kahn, D ;
Kanapin, A ;
Karavidopoulou, Y ;
Lopez, R ;
Marx, B ;
Mulder, NJ ;
Oinn, TM ;
Pagni, M ;
Servant, F ;
Sigrist, CJA ;
Zdobnov, EM .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :37-40
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].
Bairoch, A ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36
[6]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305
[7]   A JOINT PREDICTION OF THE FOLDING TYPES OF 1490 HUMAN PROTEINS FROM THEIR GENETIC CODONS [J].
CHOU, JJW ;
ZHANG, CT .
JOURNAL OF THEORETICAL BIOLOGY, 1993, 161 (02) :251-262
[8]   Prediction and classification of protein subcellular location-sequence-order effect and pseudo amino acid composition. (vol 90, pg1250, 2003) [J].
Chou, KC ;
Cai, YD .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2004, 91 (05) :1085-1085
[9]   Prediction and classification of protein subcellular location - Sequence-order effect and pseudo amino acid composition [J].
Chou, KC ;
Cai, YD .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2003, 90 (06) :1250-1260
[10]   Predicting protein quaternary structure by pseudo amino acid composition [J].
Chou, KC ;
Cai, YD .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2003, 53 (02) :282-289