MedScan, a natural language processing engine for MEDLINE abstracts

被引:149
作者
Novichkova, S [1 ]
Egorov, S [1 ]
Daraselia, N [1 ]
机构
[1] Ariadne Genom Inc, Rockville, MD 20850 USA
关键词
D O I
10.1093/bioinformatics/btg207
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The importance of extracting biomedical information from scientific publications is well recognized. A number of information extraction systems for the biomedical domain have been reported, but none of them have become widely used in practical applications. Most proposals to date make rather simplistic assumptions about the syntactic aspect of natural language. There is an urgent need for a system that has broad coverage and performs well in real-text applications. Results: We present a general biomedical domain-oriented NLP engine called MedScan that efficiently processes sentences from MEDLINE abstracts and produces a set of regularized logical structures representing the meaning of each sentence. The engine utilizes a specially developed context-free grammar and lexicon. Preliminary evaluation of the system's performance, accuracy, and coverage exhibited encouraging results. Further approaches for increasing the coverage and reducing parsing ambiguity of the engine, as well as its application for information extraction are discussed.
引用
收藏
页码:1699 / 1706
页数:8
相关论文
共 16 条
[1]  
Allen J.F., 1994, NATURAL LANGUAGE UND, V2nd
[2]  
APPLET D, 1995, P 6 MESS UND C, P237
[3]  
Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
[4]  
Friedman C, 2001, Bioinformatics, V17 Suppl 1, pS74
[5]   A GENERAL NATURAL-LANGUAGE TEXT PROCESSOR FOR CLINICAL RADIOLOGY [J].
FRIEDMAN, C ;
ALDERSON, PO ;
AUSTIN, JHM ;
CIMINO, JJ ;
JOHNSON, SB .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1994, 1 (02) :161-174
[6]  
Humphreys K, 2000, Pac Symp Biocomput, P505
[7]  
HUMPRHREYS K, 1998, P 7 MESS UND C
[8]  
Kaplan Ronald M., 1982, The Mental Representation of Grammatical Relations, P173
[9]  
McCray A T, 1991, Proc Annu Symp Comput Appl Med Care, P194
[10]   Automated extraction of information on protein-protein interactions from the biological literature [J].
Ono, T ;
Hishigaki, H ;
Tanigami, A ;
Takagi, T .
BIOINFORMATICS, 2001, 17 (02) :155-161