Prediction of protein subcellular location using a combined feature of sequence

被引:78
作者
Gao, QB [1 ]
Wang, ZZ [1 ]
Yan, C [1 ]
Du, YH [1 ]
机构
[1] Natl Univ Def Technol, Inst Automat, Changsha 410073, Peoples R China
来源
FEBS LETTERS | 2005年 / 579卷 / 16期
基金
中国国家自然科学基金;
关键词
protein subcellular location; combined feature; amino acid composition; dipeptide composition; physicochemical property; nearest neighbor; jackknife test;
D O I
10.1016/j.febslet.2005.05.021
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
To understand the structure and function of a protein , an important task is to know where it occurs in the cell. Thus, a computational method for properly predicting the subcellular location of proteins would be significant in interpreting the original data produced by the large-scale genome sequencing projects. The present work tries to explore an effective method for extracting features from protein primary sequence and find a novel measurement of similarity among proteins for classifying a protein to its proper subcellular location. We considered four locations in eukaryotic cells and three locations in prokaryotic cells, which have been investigated by several groups in the past. A combined feature of primary sequence defined as a 430D (dimensional) vector was utilized to represent a protein, including 20 amino acid compositions, 400 dipeptide compositions and 10 physicochemical properties. To evaluate the prediction performance of this encoding scheme, a jackknife test based on nearest neighbor algorithm was employed. The prediction accuracies for cytoplasmic, extracellular, mitochondrial, and nuclear proteins in the former dataset were 86.3%, 89.2%, 73.5% and 89.4%, respectively, and the total prediction accuracy reached 86.3%. As for the prediction accuracies of cytoplasmic, extracellular, and periplasmic proteins in the latter dataset, the prediction accuracies were 97.4%, 860/0, and 79.7, respectively, and the total prediction accuracy of 92.5% was achieved. The results indicate that this method outperforms some existing approaches based on amino acid composition or amino acid composition and dipeptide composition. (c) 2005 Published by Elsevier B.V. on behalf of the Federation of European Biochemical Societies.
引用
收藏
页码:3444 / 3448
页数:5
相关论文
共 53 条
[1]   Improved prediction of signal peptides: SignalP 3.0 [J].
Bendtsen, JD ;
Nielsen, H ;
von Heijne, G ;
Brunak, S .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 340 (04) :783-795
[2]   ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST [J].
Bhasin, M ;
Raghava, GPS .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W414-W419
[3]   Predicting subcellular localization of proteins in a hybridization space [J].
Cai, YD ;
Chou, KC .
BIOINFORMATICS, 2004, 20 (07) :1151-1156
[4]   Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition [J].
Cai, YD ;
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 305 (02) :407-411
[5]   Is it a paradox or misinterpretation? [J].
Cai, YD .
PROTEINS-STRUCTURE FUNCTION AND GENETICS, 2001, 43 (03) :336-338
[6]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[7]   Prediction of protein subcellular locations by incorporating quasi-sequence-order effect [J].
Chou, KC .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2000, 278 (02) :477-483
[8]   Predicting protein localization in budding yeast [J].
Chou, KC ;
Cai, YD .
BIOINFORMATICS, 2005, 21 (07) :944-950
[9]   Prediction of protein subcellular locations by GO-FunD-PseAA predictor [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 320 (04) :1236-1239
[10]   A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2003, 311 (03) :743-747