Using functional domain composition and support vector machines for prediction of protein subcellular location

被引:463
作者
Chou, KC
Cai, YD
机构
[1] Pharmacia & Upjohn Inc, Upjohn Labs, Kalamazoo, MI 49001 USA
[2] Chinese Acad Sci, Shanghai Res Ctr Biotechnol, Shanghai 200233, Peoples R China
关键词
D O I
10.1074/jbc.M204161200
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Proteins are generally classified into the following 12 subcellular locations: 1) chloroplast, 2) cytoplasm, 3) cytoskeleton, 4) endoplasmic reticulum, 5) extracellular, 6) Golgi apparatus, 7) lysosome, 8) mitochondria, 9) nucleus, 10) peroxisome, 11) plasma membrane, and 12) vacuole. Because the function of a protein is closely correlated with its subcellular location, with the rapid increase in new protein sequences entering into databanks, it is vitally important for both basic research and pharmaceutical industry to establish a high throughput tool for predicting protein subcellular location. In this paper, a new concept, the so-called "functional domain composition" is introduced. Based on the novel concept, the representation for a protein can be defined as a vector in a high-dimensional space, where each of the clustered functional domains derived from the protein universe serves as a vector base. With such a novel representation for a protein, the support vector machine (SVM) algorithm is introduced for predicting protein subcellular location. High success rates are obtained by the self-consistency test, jackknife test, and independent dataset test, respectively. The current approach not only can play an important complementary role to the powerful covariant discriminant algorithm based on the pseudo amino acid composition representation (Chou, K. C. (2001) Proteins Struct. Funct. GeneL 43, 246-255; Correction (2001) Proteins Struct. Funct Genet. 44, 60), but also may greatly stimulate the development of this area.
引用
收藏
页码:45765 / 45769
页数:5
相关论文
共 39 条
[1]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]  
APNIK VN, 1995, NATURE STAT LEARNING
[3]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].
Bairoch, A ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36
[4]   Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[5]   Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect [J].
Cai, YD ;
Liu, XJ ;
Xu, XB ;
Chou, KC .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2002, 84 (02) :343-348
[6]   Is it a paradox or misinterpretation? [J].
Cai, YD .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :336-338
[7]  
Cai Yu-Dong, 2000, Molecular Cell Biology Research Communications, V4, P230, DOI 10.1006/mcbr.2001.0285
[8]  
Cai Yu-Dong, 2000, Molecular Cell Biology Research Communications, V4, P172, DOI 10.1006/mcbr.2001.0269
[9]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[10]   Using discriminant function for prediction of subcellular location of prokaryotic proteins [J].
Chou, KC ;
Elrod, DW .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 1998, 252 (01) :63-68