Predicting subcellular localization of proteins using machine-learned classifiers

被引:249
作者
Lu, Z [1 ]
Szafron, D [1 ]
Greiner, R [1 ]
Lu, P [1 ]
Wishart, DS [1 ]
Poulin, B [1 ]
Anvik, J [1 ]
Macdonell, C [1 ]
Eisner, R [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
关键词
D O I
10.1093/bioinformatics/btg447
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Identifying the destination or localization of proteins is key to understanding their function and facilitating their purification. A number of existing computational prediction methods are based on sequence analysis. However, these methods are limited in scope, accuracy and most particularly breadth of coverage. Rather than using sequence information alone, we have explored the use of database text annotations from homologs and machine learning to substantially improve the prediction of subcellular location. Results: We have constructed five machine-learning classifiers for predicting subcellular localization of proteins from animals, plants, fungi, Gram-negative bacteria and Gram-positive bacteria, which are 81% accurate for fungi and 92-94% accurate for the other four categories. These are the most accurate subcellular predictors across the widest set of organisms ever published. Our predictors are part of the Proteome Analyst web-service.
引用
收藏
页码:547 / 556
页数:10
相关论文
共 21 条
[1]   STATISTICS NOTES - DIAGNOSTIC-TESTS-1 - SENSITIVITY AND SPECIFICITY .3. [J].
ALTMAN, DG ;
BLAND, JM .
BRITISH MEDICAL JOURNAL, 1994, 308 (6943) :1552-1552
[2]  
Duda R. O., 2000, PATTERN CLASSIFICATI
[3]   Predicting subcellular localization of proteins based on their N-terminal amino acid sequence [J].
Emanuelsson, O ;
Nielsen, H ;
Brunak, S ;
von Heijne, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :1005-1016
[4]  
Emanuelsson Olof, 2002, Brief Bioinform, V3, P361, DOI 10.1093/bib/3.4.361
[5]   PSORT-B:: improving protein subcellular localization prediction for Gram-negative bacteria [J].
Gardy, JL ;
Spencer, C ;
Wang, K ;
Ester, M ;
Tusnády, GE ;
Simon, I ;
Hua, S ;
deFays, K ;
Lambert, C ;
Nakai, K ;
Brinkman, FSL .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3613-3617
[6]  
HORTON P, 1997, P 5 ISMB, P298
[7]   Support vector machine approach for protein subcellular localization prediction [J].
Hua, SJ ;
Sun, ZR .
BIOINFORMATICS, 2001, 17 (08) :721-728
[8]  
Jurafsky D., 2000, Speech and Language Processing. An Introduction to Natural language Processing, Computational Linguistics
[9]   Wrappers for feature subset selection [J].
Kohavi, R ;
John, GH .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :273-324
[10]   Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes [J].
Krogh, A ;
Larsson, B ;
von Heijne, G ;
Sonnhammer, ELL .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 305 (03) :567-580