Automated extraction of information on protein-protein interactions from the biological literature

被引:224
作者
Ono, T
Hishigaki, H
Tanigami, A
Takagi, T
机构
[1] Univ Tokyo, Inst Med Sci, Ctr Human Genome, Minato Ku, Tokyo 1088639, Japan
[2] Otsuka Pharmaceut Co Ltd, Otsuka GEN Res Inst, Kawaguchi, Tokushima 7710192, Japan
关键词
D O I
10.1093/bioinformatics/17.2.155
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: To understand biological process, we must clarify how proteins interact with each other. However, since information about protein-protein interactions still exists primarily in the scientific literature, it is not accessible in a computer-readable format. Efficient processing of large amounts of interactions therefore needs an intelligent information extraction method. Our aim is to develop an efficient method for extracting information on protein-protein interaction from scientific literature. Results: We present a method for extracting information on protein-protein interactions from the scientific literature. This method, which employs only a protein name dictionary, surface clues on word patterns and simple part-of-speech rules, achieved high recall and precision rates for yeast (recall = 86.8% and precision = 94.3%) and Escherichia coli (recall = 82.5% and precision = 93.5%). The result of extraction suggests that our method should be applicable to any species for which a protein name dictionary is constructed.
引用
收藏
页码:155 / 161
页数:7
相关论文
共 15 条
  • [11] KEGG: Kyoto Encyclopedia of Genes and Genomes
    Ogata, H
    Goto, S
    Sato, K
    Fujibuchi, W
    Bono, H
    Kanehisa, M
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 29 - 34
  • [12] PORTER MF, 1980, PROGRAM, V14, P127
  • [13] Grasping at molecular interactions and genetic networks in Drosophila melanogaster using FlyNets, an Internet database
    Sanchez, C
    Lachaize, C
    Janody, F
    Bellon, B
    Röder, L
    Euzenat, J
    Rechenmann, F
    Jacq, B
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 89 - 94
  • [14] SEKIMIZU T, 1998, GENOME INFORMATICS, P62
  • [15] TOMAS J, 2000, P PAC S BIOC PSB2000, V5, P538