Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome

被引:161
作者
Ramani, AK
Bunescu, RC
Mooney, RJ [1 ]
Marcotte, EM
机构
[1] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
[2] Univ Texas, Ctr Syst & Synthet Biol, Austin, TX 78712 USA
[3] Univ Texas, Inst Mol & Cellular Biol, Austin, TX 78712 USA
[4] Univ Texas, Dept Chem & Biochem, Austin, TX 78712 USA
来源
GENOME BIOLOGY | 2005年 / 6卷 / 05期
关键词
D O I
10.1186/gb-2005-6-5-R40
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Extensive protein interaction maps are being constructed for yeast, worm, and fly to ask how the proteins organize into pathways and systems, but no such genome-wide interaction map yet exists for the set of human proteins. To prepare for studies in humans, we wished to establish tests for the accuracy of future interaction assays and to consolidate the known interactions among human proteins. Results: We established two tests of the accuracy of human protein interaction datasets and measured the relative accuracy of the available data. We then developed and applied natural language processing and literature-mining algorithms to recover from Medline abstracts 6,580 interactions among 3,737 human proteins. A three-part algorithm was used: first, human protein names were identified in Medline abstracts using a discriminator based on conditional random fields, then interactions were identified by the co-occurrence of protein names across the set of Medline abstracts, filtering the interactions with a Bayesian classifier to enrich for legitimate physical interactions. These mined interactions were combined with existing interaction data to obtain a network of 31,609 interactions among 7,748 human proteins, accurate to the same degree as the existing datasets. Conclusion: These interactions and the accuracy benchmarks will aid interpretation of current functional genomics data and provide a basis for determining the quality of future large-scale human protein interaction assays. Projecting from the approximately 15 interactions per protein in the best-sampled interaction set to the estimated 25,000 human genes implies more than 375,000 interactions in the complete human protein interaction network. This set therefore represents no more than 10% of the complete network.
引用
收藏
页数:12
相关论文
共 46 条
  • [1] LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks
    Adai, AT
    Date, SV
    Wieland, S
    Marcotte, EM
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2004, 340 (01) : 179 - 190
  • [2] [Anonymous], Gene Ontology Database (GO)
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] Bader GD, 2003, NUCLEIC ACIDS RES, V31, P248, DOI 10.1093/nar/gkg056
  • [5] Network biology:: Understanding the cell's functional organization
    Barabási, AL
    Oltvai, ZN
    [J]. NATURE REVIEWS GENETICS, 2004, 5 (02) : 101 - U15
  • [6] A physical and functional map of the human TNF-α NF-κB signal transduction pathway
    Bouwmeester, T
    Bauch, A
    Ruffner, H
    Angrand, PO
    Bergamini, G
    Croughton, K
    Cruciat, C
    Eberhard, D
    Gagneur, J
    Ghidelli, S
    Hopf, C
    Huhse, B
    Mangano, R
    Michon, AM
    Schirle, M
    Schlegl, J
    Schwab, M
    Stein, MA
    Bauer, A
    Casari, G
    Drewes, G
    Gavin, AC
    Jackson, DB
    Joberty, G
    Neubauer, G
    Rick, J
    Kuster, B
    Superti-Furga, G
    [J]. NATURE CELL BIOLOGY, 2004, 6 (02) : 97 - +
  • [7] Brill E, 1995, COMPUT LINGUIST, V21, P543
  • [8] BUNESCU R, 2005, IN PRESS ARTIFICIAL, DOI DOI 10.1016/J.ARTMED.2004.07.016
  • [9] Functional proteomics mapping of a human signaling pathway
    Colland, F
    Jacq, X
    Trouplin, V
    Mougin, C
    Groizeleau, C
    Hamburger, A
    Meil, A
    Wojcik, J
    Legrain, P
    Gauthier, JM
    [J]. GENOME RESEARCH, 2004, 14 (07) : 1324 - 1332
  • [10] Finishing the euchromatic sequence of the human genome
    Collins, FS
    Lander, ES
    Rogers, J
    Waterston, RH
    [J]. NATURE, 2004, 431 (7011) : 931 - 945