Using functional domain composition and support vector machines for prediction of protein subcellular location

被引:463
作者
Chou, KC
Cai, YD
机构
[1] Pharmacia & Upjohn Inc, Upjohn Labs, Kalamazoo, MI 49001 USA
[2] Chinese Acad Sci, Shanghai Res Ctr Biotechnol, Shanghai 200233, Peoples R China
关键词
D O I
10.1074/jbc.M204161200
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Proteins are generally classified into the following 12 subcellular locations: 1) chloroplast, 2) cytoplasm, 3) cytoskeleton, 4) endoplasmic reticulum, 5) extracellular, 6) Golgi apparatus, 7) lysosome, 8) mitochondria, 9) nucleus, 10) peroxisome, 11) plasma membrane, and 12) vacuole. Because the function of a protein is closely correlated with its subcellular location, with the rapid increase in new protein sequences entering into databanks, it is vitally important for both basic research and pharmaceutical industry to establish a high throughput tool for predicting protein subcellular location. In this paper, a new concept, the so-called "functional domain composition" is introduced. Based on the novel concept, the representation for a protein can be defined as a vector in a high-dimensional space, where each of the clustered functional domains derived from the protein universe serves as a vector base. With such a novel representation for a protein, the support vector machine (SVM) algorithm is introduced for predicting protein subcellular location. High success rates are obtained by the self-consistency test, jackknife test, and independent dataset test, respectively. The current approach not only can play an important complementary role to the powerful covariant discriminant algorithm based on the pseudo amino acid composition representation (Chou, K. C. (2001) Proteins Struct. Funct. GeneL 43, 246-255; Correction (2001) Proteins Struct. Funct Genet. 44, 60), but also may greatly stimulate the development of this area.
引用
收藏
页码:45765 / 45769
页数:5
相关论文
共 39 条
[11]   Prediction of protein subcellular locations by incorporating quasi-sequence-order effect [J].
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2000, 278 (02) :477-483
[12]  
Chou KC, 1999, PROTEINS, V34, P137, DOI 10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO
[13]  
2-O
[14]   Protein subcellular location prediction [J].
Chou, KC ;
Elrod, DW .
PROTEIN ENGINEERING, 1999, 12 (02) :107-118
[15]   PREDICTION OF PROTEIN STRUCTURAL CLASSES [J].
CHOU, KC ;
ZHANG, CT .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) :275-349
[16]   A NOVEL-APPROACH TO PREDICTING PROTEIN STRUCTURAL CLASSES IN A (20-1)-D AMINO-ACID-COMPOSITION SPACE [J].
CHOU, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 1995, 21 (04) :319-344
[17]  
Chou KC, 1998, PROTEINS, V31, P97, DOI 10.1002/(SICI)1097-0134(19980401)31:1<97::AID-PROT8>3.3.CO
[18]  
2-Y
[19]  
CHOU KC, 1994, J BIOL CHEM, V269, P22014
[20]   Prediction of protein cellular attributes using pseudo-amino acid composition [J].
Chou, KC .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :246-255