Learning anchor verbs for biological interaction patterns from published text articles

被引:13
作者
Hatzivassiloglou, V [1 ]
Weng, WB [1 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
关键词
protein-protein interactions; protein-gene interactions; interaction verbs; computer analysis of biological text; text mining; machine learning;
D O I
10.1016/S1386-5056(02)00054-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Much of knowledge modeling in the molecular biology domain involves interactions between proteins, genes, various forms of RNA, small molecules, etc. Interactions between these substances are typically extracted and codified manually, increasing the cost and time for modeling and substantially limiting the coverage of the resulting knowledge base. In this paper, we describe an automatic system that learns from text interaction verbs; these verbs can then form the core of automatically retrieved patterns which model classes of biological interactions. We investigate text features relating verbs with genes and proteins, and apply statistical tests and a logistic regression statistical model to determine whether a given verb belongs to the class of interaction verbs. Our system, AVAD, achieves over 87% precision and 82% recall when tested on an I I million word corpus of journal articles. In addition, we compare the automatically obtained results with a manually constructed database of interaction verbs and show that the automatic approach can significantly enrich the manual list by detecting rarer interaction verbs that were omitted from the database. (C) 2002 Elsevier Science Ireland Ltd. All rights reserved.
引用
收藏
页码:19 / 32
页数:14
相关论文
共 17 条
[1]  
Appelt DE, 1993, P 13 INT JOINT C ART
[2]   GenBank [J].
Benson, DA ;
Boguski, MS ;
Lipman, DJ ;
Ostell, J ;
Ouellette, BFF ;
Rapp, BA ;
Wheeler, DL .
NUCLEIC ACIDS RESEARCH, 1999, 27 (01) :12-17
[3]  
Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
[4]  
Brill E, 1995, COMPUT LINGUIST, V21, P543
[5]  
Charniak E, 2001, P 39 ANN M ASS COMP
[6]  
Collins M, 1997, 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P16
[7]  
Fleiss JL, 1981, STAT METHODS RATES P
[8]  
Friedman C, 2001, Bioinformatics, V17 Suppl 1, pS74
[9]  
Park J C, 2001, Pac Symp Biocomput, P396
[10]  
Proux D, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P279