Integrating protein-protein interactions and text mining for protein function prediction

被引:33
作者
Jaeger, Samira [1 ,2 ]
Gaudan, Sylvain [2 ]
Leser, Ulf [1 ]
Rebholz-Schuhmann, Dietrich [2 ]
机构
[1] Humboldt Univ, D-10099 Berlin, Germany
[2] European Bioinformat Inst, Cambridge CB10 1SD, England
关键词
D O I
10.1186/1471-2105-9-S8-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Functional annotation of proteins remains a challenging task. Currently the scientific literature serves as the main source for yet uncurated functional annotations, but curation work is slow and expensive. Automatic techniques that support this work are still lacking reliability. We developed a method to identify conserved protein interaction graphs and to predict missing protein functions from orthologs in these graphs. To enhance the precision of the results, we furthermore implemented a procedure that validates all predictions based on findings reported in the literature. Results: Using this procedure, more than 80% of the GO annotations for proteins with highly conserved orthologs that are available in UniProtKb/Swiss-Prot could be verified automatically. For a subset of proteins we predicted new GO annotations that were not available in UniProtKb/Swiss-Prot. All predictions were correct (100% precision) according to the verifications from a trained curator. Conclusion: Our method of integrating CCSs and literature mining is thus a highly reliable approach to predict GO annotations for weakly characterized proteins with orthologs.
引用
收藏
页数:10
相关论文
共 33 条
  • [1] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [2] Bader GD, 2003, NUCLEIC ACIDS RES, V31, P248, DOI 10.1093/nar/gkg056
  • [3] Baxter S M, 2001, Curr Opin Drug Discov Devel, V4, P291
  • [4] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [5] Couto FM, 2005, BMC BIOINFORMATICS, V6, DOI 10.1186/1471-2105-6-S1-S21
  • [6] Couto Francisco M, 2006, J Biomed Discov Collab, V1, P19, DOI 10.1186/1747-5333-1-19
  • [7] Measuring semantic similarity between Gene Ontology terms
    Couto, Francisco M.
    Silva, Mario J.
    Coutinho, Pedro M.
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 61 (01) : 137 - 152
  • [8] GoPubMed: Exploring PubMed with the gene ontology
    Doms, A
    Schroeder, M
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : W783 - W786
  • [9] GAUDAN S, EURASIP J BIOINFORM
  • [10] The Gene Ontology (GO) project in 2006
    Harris, Midori A.
    Clark, Jennifer I.
    Ireland, Amelia
    Lomax, Jane
    Ashburner, Michael
    Collins, Russell
    Eilbeck, Karen
    Lewis, Suzanna
    Mungall, Chris
    Richter, John
    Rubin, Gerald M.
    Shu, ShengQiang
    Blake, Judith A.
    Bult, Carol J.
    Diehl, Alexander D.
    Dolan, Mary E.
    Drabkin, Harold J.
    Eppig, Janan T.
    Hill, David P.
    Ni, Li
    Ringwald, Martin
    Balakrishnan, Rama
    Binkley, Gail
    Cherry, J. Michael
    Christie, Karen R.
    Costanzo, Maria C.
    Dong, Qing
    Engel, Stacia R.
    Fisk, Dianna G.
    Hirschman, Jodi E.
    Hitz, Benjamin C.
    Hong, Eurie L.
    Lane, Christopher
    Miyasato, Stuart
    Nash, Robert
    Sethuraman, Anand
    Skrzypek, Marek
    Theesfeld, Chandra L.
    Weng, Shuai
    Botstein, David
    Dolinski, Kara
    Oughtred, Rose
    Berardini, Tanya
    Mundodi, Suparna
    Rhee, Seung Y.
    Apweiler, Rolf
    Barrell, Daniel
    Camon, Evelyn
    Dimmer, Emily
    Mulder, Nicola
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : D322 - D326