Protein docking using surface matching and supervised machine learning

被引:22
作者
Bordner, Andrew J. [1 ]
Gorin, Andrey A. [1 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
关键词
Random Forest; evolutionary conservation; contacting residues; residue propensities; shape complementarity; protein complexes;
D O I
10.1002/prot.21406
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Computational prediction of protein complex structures through docking offers a means to gain a mechanistic understanding of protein interactions that mediate biological processes. This is particularly important as the number of experimentally determined structures of isolated proteins exceeds the number of structures of complexes. A comprehensive docking procedure is described in which efficient sampling of conformations is achieved by matching surface normal vectors, fast filtering for shape complementarity, clustering by RMSD, and scoring the docked conformations using a supervised machine learning approach. Contacting residue pair frequencies, residue propensities, evolutionary conservation, and shape complementarity score for each docking conformation are used as input data to a Random Forest classifier. The performance of the Random Forest approach for selecting correctly docked conformations was assessed by cross-validation using a nonredundant benchmark set of X-ray structures for 93 heterodimer and 733 homodimer complexes. The single highest rank docking solution was the correct (near-native) structure for slightly more than one third of the complexes. Furthermore, the fraction of highly ranked correct structures was significantly higher than the overall fraction of correct structures, for almost all complexes. A detailed analysis of the difficult to predict complexes revealed that the majority of the homodimer cases were explained by incorrect oligomeric state annotation. Evolutionary conservation and shape complementarity score as well as both underrepresented and overrepresented residue types and residue pairs were found to make the largest contributions to the overall prediction accuracy. Finally, the method was also applied to docking unbound subunit structures from a previously published benchmark set.
引用
收藏
页码:488 / 502
页数:15
相关论文
共 53 条
[1]   Ten thousand interactions for the molecular biologist [J].
Aloy, P ;
Russell, RB .
NATURE BIOTECHNOLOGY, 2004, 22 (10) :1317-1321
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   The universal protein resource (UniProt) [J].
Bairoch, Amos ;
Bougueleret, Lydie ;
Altairac, Severine ;
Amendolia, Valeria ;
Auchincloss, Andrea ;
Puy, Ghislaine Argoud ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel ;
Bridge, Alan ;
deCastro, Edouard ;
Coral, Danielle ;
Coudert, Elisabeth ;
Cusin, Isabelle ;
Dobrokhotov, Pavel ;
Dornevil, Dolnide ;
Duvaud, Severine ;
Estreicher, Anne ;
Famiglietti, Livia ;
Feuermann, Marc ;
Gehant, Sebastian ;
Farriol-Mathis, Nathalie ;
Ferro, Serenella ;
Gasteiger, Elisabeth ;
Gateau, Alain ;
Gerritsen, Vivienne ;
Gos, Arnaud ;
Gruaz-Gumowski, Nadine ;
Hinz, Ursula ;
Hulo, Chantal ;
Hulo, Nicolas ;
Ioannidis, Vassilios ;
Ivanyi, Ivan ;
James, Janet ;
Jain, Eric ;
Jimenez, Silvia ;
Jungo, Florence ;
Junker, Vivien ;
Keller, Guillaume ;
Lachaize, Corinne ;
Lane-Guermonprez, Lydie ;
Langendijk-Genevaux, Petra ;
Lara, Vicente ;
Lemercier, Philippe ;
Le Saux, Virginie .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D193-D197
[4]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[5]   Modeling oligomers with Cn or Dn symmetry:: Application to CAPRI Target 10 [J].
Berchanski, A ;
Segal, D ;
Eisenstein, M .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (02) :202-206
[6]   Construction of molecular assemblies via docking:: Modeling of tetramers with D2 symmetry [J].
Berchanski, A ;
Eisenstein, M .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 (04) :817-829
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]   Statistical analysis and prediction of protein-protein interfaces [J].
Bordner, AJ ;
Abagyan, R .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 60 (03) :353-366
[9]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[10]   Are protein-protein interfaces more conserved in sequence than the rest of the protein surface? [J].
Caffrey, DR ;
Somaroo, S ;
Hughes, JD ;
Mintseris, J ;
Huang, ES .
PROTEIN SCIENCE, 2004, 13 (01) :190-202