ALICE: An algorithm to extract abbreviations from MEDLINE

被引:46
作者
Ao, H
Takagi, TI
机构
[1] Univ Tokyo, Dept Computat Biol, Kashiwa, Chiba 2778561, Japan
[2] Kanebo Cosmet Inc, Basic Res Lab, Kanagawa, Japan
关键词
D O I
10.1197/jamia.M1757
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation Llfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly. Methods: ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules. Results: It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database. Conclusion: ALICE extracted abbreviations and their expansions from the literature efficiently. The subtly compiled heuristics enabled it to extract abbreviations with high recall without significantly reducing precision. ALICE does not only facilitate recognition of an undefined abbreviation in a paper by constructing an abbreviation database or dictionary, but also makes biomedical literature retrieval more accurate.
引用
收藏
页码:576 / 586
页数:11
相关论文
共 11 条
  • [1] Creating an online dictionary of abbreviations from MEDLINE
    Chang, JT
    Schütze, H
    Altman, RB
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2002, 9 (06) : 612 - 620
  • [2] LARKEY LS, 2000, P 5 ACM INT C DIG LI, P205
  • [3] Liu HF, 2002, AMIA 2002 SYMPOSIUM, PROCEEDINGS, P464
  • [4] Liu Hongfang, 2003, Pac Symp Biocomput, P415
  • [5] Park Y, 2001, PROCEEDINGS OF THE 2001 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P126
  • [6] Pustejovsky James, 2001, MEDINFO, V10, P371
  • [7] Schwartz Ariel S, 2003, Pac Symp Biocomput, P451
  • [8] Heuristics for identification of acronym-definition patterns within text: Towards an automated construction of comprehensive acronym-definition dictionaries
    Wren, JD
    Garner, HR
    [J]. METHODS OF INFORMATION IN MEDICINE, 2002, 41 (05) : 426 - 434
  • [9] Yeates Stuart, 1999, The third New Zealand computer science research students'conference, P117
  • [10] PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary
    Yoshida, M
    Fukuda, K
    Takagi, T
    [J]. BIOINFORMATICS, 2000, 16 (02) : 169 - 175