Bayesian inference of protein-protein interactions from biological literature

被引:49
作者
Chowdhary, Rajesh [1 ,2 ]
Zhang, Jinfeng [3 ]
Liu, Jun S. [1 ]
机构
[1] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
[2] MCRF BIRC, Marshfield Clin Marshfield Ctr, Marshfield, WI 54449 USA
[3] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
关键词
INTERACTION DATABASE; TEXT; EXTRACTION; INFORMATION; UPDATE; ABSTRACTS; PATTERNS; NETWORK;
D O I
10.1093/bioinformatics/btp245
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein-protein interaction (PPI) extraction from published biological articles has attracted much attention because of the importance of protein interactions in biological processes. Despite significant progress, mining PPIs from literatures still rely heavily on time- and resource-consuming manual annotations. Results: In this study, we developed a novel methodology based on Bayesian networks (BNs) for extracting PPI triplets (a PPI triplet consists of two protein names and the corresponding interaction word) from unstructured text. The method achieved an overall accuracy of 87% on a cross-validation test using manually annotated dataset. We also showed, through extracting PPI triplets from a large number of PubMed abstracts, that our method was able to complement human annotations to extract large number of new PPIs from literature.
引用
收藏
页码:1536 / 1542
页数:7
相关论文
共 42 条
[1]   The Biomolecular Interaction Network Database and related tools 2005 update [J].
Alfarano, C ;
Andrade, CE ;
Anthony, K ;
Bahroos, N ;
Bajec, M ;
Bantoft, K ;
Betel, D ;
Bobechko, B ;
Boutilier, K ;
Burgess, E ;
Buzadzija, K ;
Cavero, R ;
D'Abreo, C ;
Donaldson, I ;
Dorairajoo, D ;
Dumontier, MJ ;
Dumontier, MR ;
Earles, V ;
Farrall, R ;
Feldman, H ;
Garderman, E ;
Gong, Y ;
Gonzaga, R ;
Grytsan, V ;
Gryz, E ;
Gu, V ;
Haldorsen, E ;
Halupa, A ;
Haw, R ;
Hrvojic, A ;
Hurrell, L ;
Isserlin, R ;
Jack, F ;
Juma, F ;
Khan, A ;
Kon, T ;
Konopinsky, S ;
Le, V ;
Lee, E ;
Ling, S ;
Magidin, M ;
Moniakis, J ;
Montojo, J ;
Moore, S ;
Muskat, B ;
Ng, I ;
Paraiso, JP ;
Parker, B ;
Pintilie, G ;
Pirone, R .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D418-D424
[2]  
[Anonymous], P 3 INT S SEM MIN BI
[3]  
[Anonymous], DATA MINING PRACTICA
[4]   Manual curation is not sufficient for annotation of genomic databases [J].
Baumgartner, William A., Jr. ;
Cohen, K. Bretonnel ;
Fox, Lynne M. ;
Acquaah-Mensah, George ;
Hunter, Lawrence .
BIOINFORMATICS, 2007, 23 (13) :I41-I48
[5]   PDZBase: a protein-protein interaction database for PDZ-domains [J].
Beuming, T ;
Skrabanek, L ;
Niv, MY ;
Mukherjee, P ;
Weinstein, H .
BIOINFORMATICS, 2005, 21 (06) :827-828
[6]  
Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
[7]   Linking entries in protein interaction database to structured text: The FEBS Letters experiment [J].
Ceol, Arnaud ;
Chatr-Aryamontri, Andrew ;
Licata, Luana ;
Cesareni, Gianni .
FEBS LETTERS, 2008, 582 (08) :1171-1177
[8]   MINT: the molecular INTeraction database [J].
Chatr-aryamontri, Andrew ;
Ceol, Arnaud ;
Palazzi, Luisa Montecchi ;
Nardelli, Giuliano ;
Schneider, Maria Victoria ;
Castagnoli, Luisa ;
Cesareni, Gianni .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D572-D574
[9]  
Friedman C, 2001, Bioinformatics, V17 Suppl 1, pS74
[10]   Learning anchor verbs for biological interaction patterns from published text articles [J].
Hatzivassiloglou, V ;
Weng, WB .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2002, 67 (1-3) :19-32