A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0

被引:340
作者
Chou, Kuo-Chen [1 ,2 ]
Shen, Hong-Bin [1 ,2 ]
机构
[1] Gordon Life Sci Inst, San Diego, CA USA
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200030, Peoples R China
来源
PLOS ONE | 2010年 / 5卷 / 03期
基金
中国国家自然科学基金;
关键词
AMINO-ACID-COMPOSITION; SUPPORT VECTOR MACHINES; LOCATION PREDICTION; APOPTOSIS PROTEINS; GENE ONTOLOGY; DOMAIN; DATABASE; BIOINFORMATICS; CLASSIFICATION; DISCRIMINANT;
D O I
10.1371/journal.pone.0009931
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Information of subcellular locations of proteins is important for in-depth studies of cell biology. It is very useful for proteomics, system biology and drug development as well. However, most existing methods for predicting protein subcellular location can only cover 5 to 12 location sites. Also, they are limited to deal with single-location proteins and hence failed to work for multiplex proteins, which can simultaneously exist at, or move between, two or more location sites. Actually, multiplex proteins of this kind usually posses some important biological functions worthy of our special notice. A new predictor called "Euk-mPLoc 2.0'' is developed by hybridizing the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracell, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole. Compared with the existing methods for predicting eukaryotic protein subcellular localization, the new predictor is much more powerful and flexible, particularly in dealing with proteins with multiple locations and proteins without available accession numbers. For a newly-constructed stringent benchmark dataset which contains both single- and multiple-location proteins and in which none of proteins has >= 25% pairwise sequence identity to any other in a same location, the overall jackknife success rate achieved by Euk-mPLoc 2.0 is more than 24% higher than those by any of the existing predictors. As a user-friendly web-server, Euk-mPLoc 2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/. For a query protein sequence of 400 amino acids, it will take about 15 seconds for the web-server to yield the predicted result; the longer the sequence is, the more time it may usually need. It is anticipated that the novel approach and the powerful predictor as presented in this paper will have a significant impact to Molecular Cell Biology, System Biology, Proteomics, Bioinformatics, and Drug Development.
引用
收藏
页数:9
相关论文
共 68 条
[41]   The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition [J].
Lin, Hao .
JOURNAL OF THEORETICAL BIOLOGY, 2008, 252 (02) :350-356
[42]   Prediction of Subcellular Localization of Apoptosis Protein Using Chou's Pseudo Amino Acid Composition [J].
Lin, Hao ;
Wang, Hao ;
Ding, Hui ;
Chen, Ying-Li ;
Li, Qian-Zhong .
ACTA BIOTHEORETICA, 2009, 57 (03) :321-330
[43]   Protein function annotation by homology-based inference [J].
Loewenstein, Yaniv ;
Raimondo, Domenico ;
Redfern, Oliver C. ;
Watson, James ;
Frishman, Dmitrij ;
Linial, Michal ;
Orengo, Christine ;
Thornton, Janet ;
Tramontano, Anna .
GENOME BIOLOGY, 2009, 10 (02) :207
[44]   CDD: a conserved domain database for interactive domain family analysis [J].
Marchler-Bauer, Aron ;
Anderson, John B. ;
Derbyshire, Myra K. ;
DeWeese-Scott, Carol ;
Gonzales, Noreen R. ;
Gwadz, Marc ;
Hao, Luning ;
He, Siqian ;
Hurwitz, David I. ;
Jackson, John D. ;
Ke, Zhaoxi ;
Krylov, Dmitri ;
Lanczycki, Christopher J. ;
Liebert, Cynthia A. ;
Liu, Chunlei ;
Lu, Fu ;
Lu, Shennan ;
Marchler, Gabriele H. ;
Mullokandov, Mikhail ;
Song, James S. ;
Thanki, Narmada ;
Yamashita, Roxanne A. ;
Yin, Jodie J. ;
Zhang, Dachuan ;
Bryant, Stephen H. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D237-D240
[45]   A novel representation of protein sequences for prediction of subcellular location using support vector machines [J].
Matsuda, S ;
Vert, JP ;
Saigo, H ;
Ueda, N ;
Toh, H ;
Akutsu, T .
PROTEIN SCIENCE, 2005, 14 (11) :2804-2813
[46]   Exploring the Function-Location Nexus: Using Multiple Lines of Evidence in Defining the Subcellular Location of Plant Proteins [J].
Millar, A. Harvey ;
Carrie, Chris ;
Pogson, Barry ;
Whelan, James .
PLANT CELL, 2009, 21 (06) :1625-1631
[47]   The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments [J].
Murvai, J ;
Vlahovicek, K ;
Barta, E ;
Pongor, S .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :58-60
[48]   Protein sorting signals and prediction of subcellular localization [J].
Nakai, K .
ADVANCES IN PROTEIN CHEMISTRY, VOL 54: ANALYSIS OF AMINO ACID SEQUENCES, 2000, 54 :277-344
[49]   DISCRIMINATION OF INTRACELLULAR AND EXTRACELLULAR PROTEINS USING AMINO-ACID-COMPOSITION AND RESIDUE-PAIR FREQUENCIES [J].
NAKASHIMA, H ;
NISHIKAWA, K .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 238 (01) :54-61
[50]   Genetic programming for creating Chou's pseudo amino acid based features for submitochondria localization [J].
Nanni, Loris ;
Lumini, Alessandra .
AMINO ACIDS, 2008, 34 (04) :653-660