MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction

被引:220
作者
Blum, Torsten [1 ]
Briesemeister, Sebastian [1 ]
Kohlbacher, Oliver [1 ]
机构
[1] Univ Tubingen, ZBIT WSI, Div Simulat Biol Syst, D-72074 Tubingen, Germany
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
SUPPORT VECTOR MACHINES; ENSEMBLE CLASSIFIER; LOCATION PREDICTION; SEQUENCE; CELL; PEPTIDES; MPLOC; TEXT; PLOC; TOOL;
D O I
10.1186/1471-2105-10-274
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results: We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion: MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc2.
引用
收藏
页数:11
相关论文
共 56 条
[21]   Predicting subcellular localization of proteins based on their N-terminal amino acid sequence [J].
Emanuelsson, O ;
Nielsen, H ;
Brunak, S ;
von Heijne, G .
JOURNAL OF MOLECULAR BIOLOGY, 2000, 300 (04) :1005-1016
[22]   ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites [J].
Emanuelsson, O ;
Nielsen, H ;
Von Heijne, G .
PROTEIN SCIENCE, 1999, 8 (05) :978-984
[23]   Locating proteins in the cell using TargetP, SignalP and related tools [J].
Emanuelsson, Olof ;
Brunak, Soren ;
von Heijne, Gunnar ;
Nielsen, Henrik .
NATURE PROTOCOLS, 2007, 2 (04) :953-971
[24]   Annotation of bacterial genomes using improved phylogenomic profiles [J].
Enault, F. ;
Suhre, K. ;
Abergel, C. ;
Poirot, O. ;
Claverie, J. -M. .
BIOINFORMATICS, 2003, 19 :i105-i107
[25]  
Fujiwara Y, 2001, Genome Inform, V12, P103
[26]   Improving subcellular localization prediction using text classification and the gene ontology [J].
Fyshe, Alona ;
Liu, Yifeng ;
Szafron, Duane ;
Greiner, Russ ;
Lu, Paul .
BIOINFORMATICS, 2008, 24 (21) :2512-2517
[27]  
Guda Chittibabu, 2005, Bioinformatics, V21, P3963
[28]   TSSub: eukaryotic protein subcellular localization by extracting features from profiles [J].
Guo, Jian ;
Lin, Yuanlie .
BIOINFORMATICS, 2006, 22 (14) :1784-1785
[29]   MultiLoc:: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition [J].
Höglund, A ;
Dönnes, P ;
Blum, T ;
Adolph, HW ;
Kohlbacher, O .
BIOINFORMATICS, 2006, 22 (10) :1158-1165
[30]   WoLF PSORT: protein localization predictor [J].
Horton, Paul ;
Park, Keun-Joon ;
Obayashi, Takeshi ;
Fujita, Naoya ;
Harada, Hajime ;
Adams-Collier, C. J. ;
Nakai, Kenta .
NUCLEIC ACIDS RESEARCH, 2007, 35 :W585-W587