Kernel methods for predicting protein-protein interactions

被引:384
作者
Ben-Hur, A [1 ]
Noble, WS
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
D O I
10.1093/bioinformatics/bti1016
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Despite advances in high-throughput methods for discovering protein-protein interactions, the interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. Results: We present a kernel method for predicting protein-protein interactions using a combination of data sources, including protein sequences, Gene Ontology annotations, local properties of the network, and homologous interactions in other species. Whereas protein kernels proposed in the literature provide a similarity between single proteins, prediction of interactions requires a kernel between pairs of proteins. We propose a pairwise kernel that converts a kernel between single proteins into a kernel between pairs of proteins, and we illustrate the kernel's effectiveness in conjunction with a support vector machine classifier. Furthermore, we obtain improved performance by combining several sequence-based kernels based on k-mer frequency, motif and domain content and by further augmenting the pairwise sequence kernel with features that are based on other sources of data. We apply our method to predict physical interactions in yeast using data from the BIND database. At a false positive rate of 1% the classifier retrieves close to 80% of a set of trusted interactions. We thus demonstrate the ability of our method to make accurate predictions despite the sizeable fraction of false positives that are known to exist in interaction databases.
引用
收藏
页码:I38 / I46
页数:9
相关论文
共 28 条
[1]  
[Anonymous], 1998, Encyclopedia of Biostatistics
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   BIND - The Biomolecular Interaction Network Database [J].
Bader, GD ;
Donaldson, I ;
Wolting, C ;
Ouellette, BFF ;
Pawson, T ;
Hogue, CWV .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :242-245
[4]   Remote homology detection: a motif based approach [J].
Ben-Hur, Asa ;
Brutlag, Douglas .
BIOINFORMATICS, 2003, 19 :i26-i33
[5]   Protein interactions - Two methods for assessment of the reliability of high throughput observations [J].
Deane, CM ;
Salwinski, L ;
Xenarios, I ;
Eisenberg, D .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (05) :349-356
[6]   Inferring domain-domain interactions from protein-protein interactions [J].
Deng, MH ;
Mehta, S ;
Sun, FZ ;
Chen, T .
GENOME RESEARCH, 2002, 12 (10) :1540-1548
[7]   Assessing experimentally derived interactions in a small world [J].
Goldberg, DS ;
Roth, FP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (08) :4372-4376
[8]   Learning to predict protein-protein interactions from protein sequences [J].
Gomez, SM ;
Noble, WS ;
Rzhetsky, A .
BIOINFORMATICS, 2003, 19 (15) :1875-1881
[9]   A Bayesian networks approach for predicting protein-protein interactions from genomic data [J].
Jansen, R ;
Yu, HY ;
Greenbaum, D ;
Kluger, Y ;
Krogan, NJ ;
Chung, SB ;
Emili, A ;
Snyder, M ;
Greenblatt, JF ;
Gerstein, M .
SCIENCE, 2003, 302 (5644) :449-453
[10]  
Lanckriet GRG, 2003, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, P300