Going from where to why-interpretable prediction of protein subcellular localization

被引:126
作者
Briesemeister, Sebastian [1 ]
Rahnenfuehrer, Joerg [2 ]
Kohlbacher, Oliver [1 ]
机构
[1] Univ Tubingen, Div Simulat Biol Syst, Tubingen, Germany
[2] TU Dortmund Univ, Dept Stat, Dortmund, Germany
关键词
SUPPORT VECTOR MACHINES; GENE ONTOLOGY TERMS; SEQUENCE; LOCATION; CLASSIFICATION; MITOCHONDRIAL; PROTEOMES; SIGNAL; SITES; TOOL;
D O I
10.1093/bioinformatics/btq115
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein subcellular localization is pivotal in understanding a protein's function. Computational prediction of subcellular localization has become a viable alternative to experimental approaches. While current machine learning-based methods yield good prediction accuracy, most of them suffer from two key problems: lack of interpretability and dealing with multiple locations. Results: We present YLoc, a novel method for predicting protein subcellular localization that addresses these issues. Due to its simple architecture, YLoc can identify the relevant features of a protein sequence contributing to its subcellular localization, e.g. localization signals or motifs relevant to protein sorting. We present several example applications where YLoc identifies the sequence features responsible for protein localization, and thus reveals not only to which location a protein is transported to, but also why it is transported there. YLoc also provides a confidence estimate for the prediction. Thus, the user can decide what level of error is acceptable for a prediction. Due to a probabilistic approach and the use of several thousands of dual-targeted proteins, YLoc is able to predict multiple locations per protein. YLoc was benchmarked using several independent datasets for protein subcellular localization and performs on par with other state-of-the-art predictors. Disregarding low-confidence predictions, YLoc can achieve prediction accuracies of over 90%. Moreover, we show that YLoc is able to reliably predict multiple locations and outperforms the best predictors in this area.
引用
收藏
页码:1232 / 1238
页数:7
相关论文
共 53 条
[1]  
[Anonymous], 1993, Proceedings of the 13th International Joint Conference on Artificial Intelligence
[2]   Extensive feature detection of N-terminal protein sorting signals [J].
Bannai, H ;
Tamada, Y ;
Maruyama, O ;
Nakai, K ;
Miyano, S .
BIOINFORMATICS, 2002, 18 (02) :298-305
[3]   MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction [J].
Blum, Torsten ;
Briesemeister, Sebastian ;
Kohlbacher, Oliver .
BMC BIOINFORMATICS, 2009, 10 :274
[4]   Prediction of subcellular localization using sequence-biased recurrent networks [J].
Bodén, M ;
Hawkins, J .
BIOINFORMATICS, 2005, 21 (10) :2279-2286
[5]  
Brady Scott, 2008, Pac Symp Biocomput, P604
[6]   SherLoc2: A High-Accuracy Hybrid Method for Predicting Subcellular Localization of Proteins [J].
Briesemeister, Sebastian ;
Blum, Torsten ;
Brady, Scott ;
Lam, Yin ;
Kohlbacher, Oliver ;
Shatkay, Hagit .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (11) :5363-5366
[7]   2 DIFFERENTIALLY REGULATED MESSENGER-RNAS WITH DIFFERENT 5' ENDS ENCODE SECRETED AND INTRACELLULAR FORMS OF YEAST INVERTASE [J].
CARLSON, M ;
BOTSTEIN, D .
CELL, 1982, 28 (01) :145-154
[8]  
Casadio Rita, 2008, Briefings in Functional Genomics & Proteomics, V7, P63, DOI 10.1093/bfgp/eln003
[9]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[10]   Prediction and classification of protein subcellular location - Sequence-order effect and pseudo amino acid composition [J].
Chou, KC ;
Cai, YD .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2003, 90 (06) :1250-1260