Filtering high-throughput protein-protein interaction data using a combination of genomic features

被引:124
作者
Patil, A
Nakamura, H
机构
[1] Osaka Univ, Inst Prot Res, Suita, Osaka 5650871, Japan
[2] Osaka Univ, Grad Sch Sci, Dept Biol, Toyonaka, Osaka 5600043, Japan
关键词
D O I
10.1186/1471-2105-6-100
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. Results: In this study, we use a combination of 3 genomic features-structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology - as a means to assign reliability to the protein-protein interactions in Saccharomyces cerevisiae determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90%) and good specificity (63%). We show that 56% of the interactions from high-throughput experiments in Saccharomyces cerevisiae have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at http://helix.protein.osaka-u.ac.jp/htp/. Conclusion: A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction.
引用
收藏
页数:13
相关论文
共 41 条
[1]   Predicting protein complex membership using probabilistic network reliability [J].
Asthana, S ;
King, OD ;
Gibbons, FD ;
Roth, FP .
GENOME RESEARCH, 2004, 14 (06) :1170-1175
[2]   Analyzing yeast protein-protein interaction data obtained from different sources [J].
Bader, GD ;
Hogue, CWV .
NATURE BIOTECHNOLOGY, 2002, 20 (10) :991-997
[3]   Gaining confidence in high-throughput protein interaction networks [J].
Bader, JS ;
Chaudhuri, A ;
Rothberg, JM ;
Chant, J .
NATURE BIOTECHNOLOGY, 2004, 22 (01) :78-85
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[5]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[6]   Protein interaction networks from yeast to human [J].
Bork, P ;
Jensen, LJ ;
von Mering, C ;
Ramani, AK ;
Lee, I ;
Marcotte, EM .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2004, 14 (03) :292-299
[7]   A physical and functional map of the human TNF-α NF-κB signal transduction pathway [J].
Bouwmeester, T ;
Bauch, A ;
Ruffner, H ;
Angrand, PO ;
Bergamini, G ;
Croughton, K ;
Cruciat, C ;
Eberhard, D ;
Gagneur, J ;
Ghidelli, S ;
Hopf, C ;
Huhse, B ;
Mangano, R ;
Michon, AM ;
Schirle, M ;
Schlegl, J ;
Schwab, M ;
Stein, MA ;
Bauer, A ;
Casari, G ;
Drewes, G ;
Gavin, AC ;
Jackson, DB ;
Joberty, G ;
Neubauer, G ;
Rick, J ;
Kuster, B ;
Superti-Furga, G .
NATURE CELL BIOLOGY, 2004, 6 (02) :97-+
[8]   Protein interactions - Two methods for assessment of the reliability of high throughput observations [J].
Deane, CM ;
Salwinski, L ;
Xenarios, I ;
Eisenberg, D .
MOLECULAR & CELLULAR PROTEOMICS, 2002, 1 (05) :349-356
[9]   Bridging structural biology and genomics: assessing protein interaction data with known complexes [J].
Edwards, AM ;
Kus, B ;
Jansen, R ;
Greenbaum, D ;
Greenblatt, J ;
Gerstein, M .
TRENDS IN GENETICS, 2002, 18 (10) :529-536
[10]  
Eng J., ROC ANAL WEB BASED C