Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction

被引:17
作者
Santos, C [1 ]
Eggle, D [1 ]
States, DJ [1 ]
机构
[1] Univ Michigan, Bioinformat Program, Ann Arbor, MI 48109 USA
关键词
D O I
10.1093/bioinformatics/bti165
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map. Results: A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/similar to rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases.
引用
收藏
页码:1653 / 1658
页数:6
相关论文
共 24 条
[1]  
ABNEY S, 1996, STAT METHODS LINGUIS
[2]  
ABNEY S, 1996, J NATURAL LANGUAGE E, V2, P337
[3]  
ANDRADE MA, 1997, ISMB, V5, P25
[4]   Analyzing yeast protein-protein interaction data obtained from different sources [J].
Bader, GD ;
Hogue, CWV .
NATURE BIOTECHNOLOGY, 2002, 20 (10) :991-997
[5]  
Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
[6]   Wnt signaling is required at distinct stages of development for the induction of the posterior forebrain [J].
Braun, MM ;
Etheridge, A ;
Bernard, A ;
Robertson, CP ;
Roelink, H .
DEVELOPMENT, 2003, 130 (23) :5579-5587
[7]   Wnt-1 signaling inhibits apoptosis by activating β-catenin/T cell factor-mediated transcription [J].
Chen, SQ ;
Guttridge, DC ;
You, ZB ;
Zhang, ZC ;
Fribley, A ;
Mayo, MW ;
Kitajewski, J ;
Wang, CY .
JOURNAL OF CELL BIOLOGY, 2001, 152 (01) :87-96
[8]   Extracting human protein interactions from MEDLINE using a full-sentence parser [J].
Daraselia, N ;
Yuryev, A ;
Egorov, S ;
Novichkova, S ;
Nikitin, A ;
Mazo, I .
BIOINFORMATICS, 2004, 20 (05) :604-U43
[9]  
Iliopoulos I, 2001, Pac Symp Biocomput, P384
[10]   Synergistic activation of the Wnt signaling pathway by Dvl and casein kinase Iε [J].
Kishida, M ;
Hino, S ;
Michiue, T ;
Yamamoto, H ;
Kishida, S ;
Fukui, A ;
Asashima, M ;
Kikuchi, A .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2001, 276 (35) :33147-33155