An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins

被引:63
作者
Martelli, Pier Luigi [1 ]
Fariselli, Piero [1 ]
Casadio, Rita [1 ]
机构
[1] Univ Bologna, Lab Biocomp, CIRB, Dept Biol, I-40126 Bologna, Italy
关键词
D O I
10.1093/bioinformatics/btg1027
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: All-alpha membrane proteins constitute a functionally relevant subset of the whole proteome. Their content ranges from about 10 to 30% of the cell proteins, based on sequence comparison and specific predictive methods. Due to the paucity of membrane proteins solved with atomic resolution, the training/testing sets of predictive methods for protein topography and topology routinely include very few well-solved structures mixed with a hundred proteins known with low resolution. Moreover, available predictors fail in predicting recently crystallised membrane proteins (Chen et al., 2002). Presently the number of well-solved membrane proteins comprises some 59 chains of low sequence homology. It is therefore possible to train/test predictors only with the set of proteins known with atomic resolution and evaluate more thoroughly the performance of different methods. Results: We implement a cascade-neural network (NN), two different hidden Markov models (HMM), and their ensemble (ENSEMBLE) as a new method. We train and test in cross validation the three methods and ENSEMBLE on the 59 well resolved membrane proteins. ENSEMBLE scores with a per-protein accuracy of 90% for topography and 71% for topology, outperforming the best single method of 7 and 5 percentage points, respectively. When tested on a low resolution set of 151 proteins, with no homology with the 59 proteins, the per-protein accuracy of ENSEMBLE is 76% for topography and 68% for topology. Our results also indicate that the performance of ENSEMBLE is higher than that of the best predictors presently available on the Web.
引用
收藏
页码:i205 / i211
页数:7
相关论文
共 23 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Transmembrane helix predictions revisited [J].
Chen, CP ;
Kernytsky, A ;
Rost, B .
PROTEIN SCIENCE, 2002, 11 (12) :2774-2791
[3]   Long membrane helices and short loops predicted less accurately [J].
Chen, CP ;
Rost, B .
PROTEIN SCIENCE, 2002, 11 (12) :2766-2773
[4]  
Durbin R., 1998, BIOL SEQUENCE ANAL P
[5]   MaxSubSeq: an algorithm for segment-length optimization. The case study of the transmembrane spanning segments [J].
Fariselli, P ;
Finelli, M ;
Marchignoli, D ;
Martelli, PL ;
Rossi, I ;
Casadio, R .
BIOINFORMATICS, 2003, 19 (04) :500-505
[6]   Prediction of the transmembrane regions of β-barrel membrane proteins with a neural network-based predictor [J].
Jacoboni, I ;
Martelli, PL ;
Fariselli, P ;
De Pinto, V ;
Casadio, R .
PROTEIN SCIENCE, 2001, 10 (04) :779-787
[7]   MPtopo: A database of membrane protein topology [J].
Jayasinghe, S ;
Hristova, K ;
White, SH .
PROTEIN SCIENCE, 2001, 10 (02) :455-458
[8]   A MODEL RECOGNITION APPROACH TO THE PREDICTION OF ALL-HELICAL MEMBRANE-PROTEIN STRUCTURE AND TOPOLOGY [J].
JONES, DT ;
TAYLOR, WR ;
THORTON, JM .
BIOCHEMISTRY, 1994, 33 (10) :3038-3049
[9]   Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes [J].
Krogh, A ;
Larsson, B ;
von Heijne, G ;
Sonnhammer, ELL .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 305 (03) :567-580
[10]  
Martelli Pier Luigi, 2002, Bioinformatics, V18 Suppl 1, pS46