MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM

被引:449
作者
Chou, Kuo-Chen
Shen, Hong-Bin
机构
[1] Gordon Life Sci Inst, San Diego, CA 92130 USA
[2] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200030, Peoples R China
关键词
membrane protein type; protein evolution; Pse-PSSM; OET-KNN; ensemble classifier; fusion; MemType-2L;
D O I
10.1016/j.bbrc.2007.06.027
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Given an uncharacterized protein sequence, how can we identify whether it is a membrane protein or not? If it is, which membrane protein type it belongs to? These questions are important because they are closely relevant to the biological function of the query protein and to its interaction process with other molecules in a biological system. Particularly, with the avalanche of protein sequences generated in the Post-Genomic Age and the relatively much slower progress in using biochemical experiments to determine their functions, it is highly desired to develop an automated method that can be used to help address these questions. In this study, a 2-layer predictor, called MemType-2L, has been developed: the I st layer prediction engine is to identify a query protein as membrane or non-membrane; if it is a membrane protein, the process will be automatically continued with the 2nd-layer prediction engine to further identify its type among the following eight categories: (1) type 1, (2) type 11, (3) type 111, (4) type IV, (5) multipass, (6) lipid-chain-anchored, (7) GPI-anchored, and (8) peripheral. MemType-2L is featured by incorporating the evolution information through representing the protein samples with the Pse-PSSM (Pseudo Position-Specific Score Matrix) vectors, and by containing an ensemble classifier formed by fusing many powerful individual OET-KNN (Optimized Evidence-Theoretic K-Nearest Neighbor) classifiers. The success rates obtained by MemType-2L on a new-constructed stringent dataset by both the jackknife test and the independent dataset test are quite high, indicating that MemType-2L may become a very useful high throughput tool. As a Web server, MemType-2L is freely accessible to the public at http:// chou.med.harvard.edu/bioinf/MenType. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:339 / 345
页数:7
相关论文
共 57 条
[1]  
ALBERTS B, 1994, MOL BIOL CELL, pCH1
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Support vector machines for predicting membrane protein types by using functional domain composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
BIOPHYSICAL JOURNAL, 2003, 84 (05) :3257-3263
[4]  
CAO Y, 2006, BMC BIOINFORM, V7
[5]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[6]   Using pseudo-amino acid composition and support vector machine to predict protein structural class [J].
Chen, Chao ;
Tian, Yuan-Xin ;
Zou, Xiao-Yong ;
Cai, Pei-Xiang ;
Mo, Jin-Yuan .
JOURNAL OF THEORETICAL BIOLOGY, 2006, 243 (03) :444-448
[7]   Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network [J].
Chen, Chao ;
Zhou, Xibin ;
Tian, Yuanxin ;
Zou, Xiaoyong ;
Cai, Peixiang .
ANALYTICAL BIOCHEMISTRY, 2006, 357 (01) :116-121
[8]   Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes [J].
Chou, KC .
BIOINFORMATICS, 2005, 21 (01) :10-19
[9]  
Chou KC, 1999, PROTEINS, V34, P137, DOI 10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO
[10]  
2-O