Progress and challenges in predicting protein-protein interaction sites

被引:126
作者
Ezkurdia, Lakes [1 ,2 ]
Bartoli, Lisa [3 ]
Fariselli, Piero [3 ]
Casadio, Rita [3 ]
Valencia, Alfonso [1 ,4 ]
Tress, Michael L. [1 ]
机构
[1] Spanish Natl Canc Res Ctr CNIO, Madrid 28029, Spain
[2] Ctr Nacl Biotecnol, Madrid, Spain
[3] Univ Bologna, Biocomp Grp, I-40126 Bologna, Italy
[4] CSIC, INTA, CAB, Ctr Astrobiol, Madrid, Spain
关键词
protein-protein interaction; binding sites; protein complexes; prediction; machine learning; MOLECULAR RECOGNITION; BINDING-SITES; HOT-SPOTS; SECONDARY STRUCTURE; SEQUENCE PROFILE; FLEXIBLE NETS; RESIDUES; INTERFACES; DOCKING; COMPLEXES;
D O I
10.1093/bib/bbp021
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The identification of proteinprotein interaction sites is an essential intermediate step for mutant design and the prediction of protein networks. In recent years a significant number of methods have been developed to predict these interface residues and here we review the current status of the field. Progress in this area requires a clear view of the methodology applied, the data sets used for training and testing the systems, and the evaluation procedures. We have analysed the impact of a representative set of features and algorithms and highlighted the problems inherent in generating reliable protein data sets and in the posterior analysis of the results. Although it is clear that there have been some improvements in methods for predicting interacting sites, several major bottlenecks remain. Proteins in complexes are still under-represented in the structural databases and in particular many proteins involved in transient complexes are still to be crystallized. We provide suggestions for effective feature selection, and make it clear that community standards for testing, training and performance measures are necessary for progress in the field.
引用
收藏
页码:233 / 246
页数:14
相关论文
共 122 条
[1]   Accurate prediction of solvent accessibility using neural networks-based regression [J].
Adamczak, R ;
Porollo, A ;
Meller, J .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 56 (04) :753-767
[2]   Automated structure-based prediction of functional sites in proteins: Applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking [J].
Aloy, P ;
Querol, E ;
Aviles, FX ;
Sternberg, MJE .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 311 (02) :395-408
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]  
[Anonymous], 2007, NUCLEIC ACIDS RES, V35, P193
[5]  
[Anonymous], NUCLEIC ACIDS RES
[6]  
[Anonymous], 2007, R LANG ENV STAT COMP
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]   Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[9]   Statistical analysis and prediction of protein-protein interfaces [J].
Bordner, AJ ;
Abagyan, R .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (03) :353-366
[10]   The C-terminal domain of measles virus nucleoprotein belongs to the class of intrinsically disordered proteins that fold upon binding to their physiological partner [J].
Bourhis, JM ;
Johansson, K ;
Receveur-Bréchot, V ;
Oldfield, CJ ;
Dunker, KA ;
Canard, B ;
Longhi, S .
VIRUS RESEARCH, 2004, 99 (02) :157-167