Predicting protein-protein interactions by a supervised learning classifier

被引:5
作者
Huang, Y
Frishman, D [1 ]
Muchnik, I
机构
[1] Tech Univ Munich, Wissensch Zentrum Weihenstephan, Dept Genome Oriented Bioinformat, D-85354 Freising Weihenstephan, Germany
[2] Rutgers State Univ, Dept Comp Sci, New Brunswick, NJ 08903 USA
关键词
protein-protein interactions; SVM learning; protein domains; genome analysis;
D O I
10.1016/j.compbiolchem.2004.07.003
中图分类号
Q [生物科学];
学科分类号
07 [理学]; 0710 [生物学]; 09 [农学];
摘要
Reliable prediction of protein-protein interactions based on sequence information represents a major challenge in computational biology. Based on the assumption that the likelihood of two proteins to interact with each other is associated with their structural domain composition and functional role, we transformed the problem of predicting protein interactions to a classification problem. We developed a heuristic to generate training pairs and test pairs, and then designed a new feature space to represent the training data. In particular, we propose a new method to construct a negative data set such that the functional and structural properties of putative non-interacting proteins strongly resemble the properties of proteins known to interact. The support vector machine algorithm was used to perform the classification of interacting and non-interacting protein pairs in Saccharomyces cerevisiae and to search for optimal training parameters. The accuracy of the system to predict whether two yeast proteins interact in a 10-fold cross-validation experiment was 79%.
引用
收藏
页码:291 / 301
页数:11
相关论文
共 48 条
[1]
The coordinated functions of the E-coli MutS and MutL proteins in mismatch repair [J].
Acharya, S ;
Foster, PL ;
Brooks, P ;
Fishel, R .
MOLECULAR CELL, 2003, 12 (01) :233-246
[2]
BAHLMANN C, 2002, 8 INT WORKSH FRONT H
[3]
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[4]
A VARIETY OF DNA-BINDING AND MULTIMERIC PROTEINS CONTAIN THE HISTONE FOLD MOTIF [J].
BAXEVANIS, AD ;
ARENTS, G ;
MOUDRIANAKIS, EN ;
LANDSMAN, D .
NUCLEIC ACIDS RESEARCH, 1995, 23 (14) :2685-2691
[5]
BI J, 2003, P 16 ANN C LEARN THE
[6]
Predicting protein-protein interactions from primary structure [J].
Bock, JR ;
Gough, DA .
BIOINFORMATICS, 2001, 17 (05) :455-460
[7]
A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[8]
Homomeric ring assemblies of eukaryotic Sm proteins have affinity for both RNA and DNA - Crystal structure of an oligomeric complex of yeast SmF [J].
Collins, BM ;
Cubeddu, L ;
Naidoo, N ;
Harrop, SJ ;
Kornfeld, GD ;
Dawes, IW ;
Curmi, PMG ;
Mabbutt, BC .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2003, 278 (19) :17291-17298
[9]
Cristianini N., 2000, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods
[10]
Conservation of gene order: a fingerprint of proteins that physically interact [J].
Dandekar, T ;
Snel, B ;
Huynen, M ;
Bork, P .
TRENDS IN BIOCHEMICAL SCIENCES, 1998, 23 (09) :324-328