PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary

被引:44
作者
Yoshida, M [1 ]
Fukuda, K [1 ]
Takagi, T [1 ]
机构
[1] Univ Tokyo, Inst Med Sci, Ctr Human Genome, Minato Ku, Tokyo 1088639, Japan
基金
日本学术振兴会;
关键词
D O I
10.1093/bioinformatics/16.2.169
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since their initial development, integration and construction of databases for molecular-level data have progressed. Though biological molecules ave related to each other and form a complex system, the information is stored in the vast archives of the literature or in diverse databases. There is no unified naming convention for biological object, and biological terms may be ambiguous or polysemic. This makes the integration and interaction of databases difficult. In order to eliminate these problems, machine-readable natural language resources appear to be quite promising. We have developed a workbench for protein name abbreviation dictionary (PNAD) building. Results: We have developed PNAD Construction Support System (PNAD-CSS), which offers various convenient facilities to decrease the construction costs of a protein name abbreviation dictionary of which entries are collected from abstracts in biomedical papers. The system allows the users to concentrate on higher level interpretation by removing some troublesome tasks, e.g. management of abstracts, extracting protein names and their abbreviations, and so on. To extract a pair of protein names and abbreviations, we have developed a hybrid system composed of the PROPER System and the PNAD System. The PNAD System can extract the pairs from parenthetical-paraphrases involved in protein names, the PROPER System identified these pairs, with 98.95% precision, 95.56% recall and 97.58% complete precision. Availability: PROPER System is freely available from http://www.hgc.ims.u-tokyo.ac.jp/service/tooldoc/KeX/intro.html. The other software are also available on request. Contact the authors. Contact: mikio@ims.u-tokyo.ac.jp.
引用
收藏
页码:169 / 175
页数:7
相关论文
共 11 条
  • [1] Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families
    Andrade, MA
    Valencia, A
    [J]. BIOINFORMATICS, 1998, 14 (07) : 600 - 607
  • [2] Baeza-Yates R. A, 1992, INFORMATION RETRIEVA, P13
  • [3] Fukuda K, 1998, Pac Symp Biocomput, P707
  • [4] Knuth D. E., 1984, TEXBOOK
  • [5] *NLM, 1998, PUBM NLMS SEARCH SER
  • [6] Ohta Y, 1997, ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, P218
  • [7] REID T, 1998, TREND GENETICS GENE
  • [8] Schulze-Kremer S, 1998, Pac Symp Biocomput, P695
  • [9] SMADJA F, 1993, COMPUTATIONAL LINGUI, V19
  • [10] Su Keh-Yih, 1994, P 32 ANN M ASS COMP