Using GO-PseAA predictor to identify membrane proteins and their types

被引:60
作者
Chou, KC [2 ]
Cai, YD
机构
[1] Tianjin Inst Bioinformat & Drug Discovery, Tianjin, Peoples R China
[2] Gordon Life Sci Inst, San Diego, CA 92130 USA
[3] Univ Manchester, Inst Sci & Technol, Dept Biomed Sci, Manchester M60 1QD, Lancs, England
关键词
type-1; type-2; multi-pass transmembrane; lipid chain-anchored; GPI-anchored; gene ontology; pseudo amino acid composition;
D O I
10.1016/j.bbrc.2004.12.069
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Cell membranes are crucial to the life of a cell. Although the basic structure of biological membrane is provided by the lipid bilayer, most of the specific functions are carried out by membrane proteins. Knowledge of membrane protein type often offers important clues toward determining the function of an uncharacterized protein. Therefore, predicting the type of a membrane protein from its primary sequence, or even just identifying whether the uncharacterized protein belongs to a membrane protein or not, is an important and challenging problem in bioinformatics and proteomics. To deal with these problems, the GO-PseAA predictor is introduced that is operated in a hybridization space by combining the gene ontology and pseudo amino acid composition. Meanwhile, to test the prediction quality, a dataset was constructed that contains 6476 non-membrane proteins and 5122 membrane proteins classified into five different types (Online Supplementary Materials A). To avoid redundancy and bias, none of the proteins included has greater than or equal to 40% sequence identity to any other. It has been observed that. the overall success rate by the jackknife cross-validation test in identifying non-membrane proteins and membrane proteins was 94.76%, and that in identifying the five membrane protein types was 95.84%. The high success rates suggest that the GO-PseAA predictor can catch the core feature of the statistical samples concerned and may become an automated high throughput toll in molecular and cell biology. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:845 / 847
页数:3
相关论文
共 29 条
[1]  
ALBERTS B, 1994, MOL BIOL CELL, pCH1
[2]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL [J].
Bairoch, A ;
Apweller, R .
NUCLEIC ACIDS RESEARCH, 1997, 25 (01) :31-36
[3]   Application of SVM to predict membrane protein types [J].
Cai, YD ;
Ricardo, PW ;
Jen, CH ;
Chou, KC .
JOURNAL OF THEORETICAL BIOLOGY, 2004, 226 (04) :373-376
[4]   Support vector machines for predicting membrane protein types by using functional domain composition [J].
Cai, YD ;
Zhou, GP ;
Chou, KC .
BIOPHYSICAL JOURNAL, 2003, 84 (05) :3257-3263
[5]   Relation between amino acid composition and cellular location of proteins [J].
Cedano, J ;
Aloy, P ;
PerezPons, JA ;
Querol, E .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 266 (03) :594-600
[6]   Predicting enzyme family class in a hybridization space [J].
Chou, KC ;
Cai, YD .
PROTEIN SCIENCE, 2004, 13 (11) :2857-2863
[7]   Predicting protein structural class by functional domain composition [J].
Chou, KC ;
Cai, YD .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2004, 321 (04) :1007-1009
[8]  
Chou KC, 1999, PROTEINS, V34, P137, DOI 10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO
[9]  
2-O
[10]   PREDICTION OF PROTEIN STRUCTURAL CLASSES [J].
CHOU, KC ;
ZHANG, CT .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) :275-349