Automated extraction of information on protein-protein interactions from the biological literature

被引:224
作者
Ono, T
Hishigaki, H
Tanigami, A
Takagi, T
机构
[1] Univ Tokyo, Inst Med Sci, Ctr Human Genome, Minato Ku, Tokyo 1088639, Japan
[2] Otsuka Pharmaceut Co Ltd, Otsuka GEN Res Inst, Kawaguchi, Tokushima 7710192, Japan
关键词
D O I
10.1093/bioinformatics/17.2.155
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: To understand biological process, we must clarify how proteins interact with each other. However, since information about protein-protein interactions still exists primarily in the scientific literature, it is not accessible in a computer-readable format. Efficient processing of large amounts of interactions therefore needs an intelligent information extraction method. Our aim is to develop an efficient method for extracting information on protein-protein interaction from scientific literature. Results: We present a method for extracting information on protein-protein interactions from the scientific literature. This method, which employs only a protein name dictionary, surface clues on word patterns and simple part-of-speech rules, achieved high recall and precision rates for yeast (recall = 86.8% and precision = 94.3%) and Escherichia coli (recall = 82.5% and precision = 93.5%). The result of extraction suggests that our method should be applicable to any species for which a protein name dictionary is constructed.
引用
收藏
页码:155 / 161
页数:7
相关论文
共 15 条
  • [1] [Anonymous], 1998, GENOME INFORM
  • [2] BLASCHKE A, 1999, P 5 INT C INT SYST M, P60
  • [3] The complete genome sequence of Escherichia coli K-12
    Blattner, FR
    Plunkett, G
    Bloch, CA
    Perna, NT
    Burland, V
    Riley, M
    ColladoVides, J
    Glasner, JD
    Rode, CK
    Mayhew, GF
    Gregor, J
    Davis, NW
    Kirkpatrick, HA
    Goeden, MA
    Rose, DJ
    Mau, B
    Shao, Y
    [J]. SCIENCE, 1997, 277 (5331) : 1453 - +
  • [4] BRILL E, 1994, P 12 NAT C ART INT
  • [5] Chater K, 1995, Trends Genet, P5
  • [6] SGD:: Saccharomyces Genome Database
    Cherry, JM
    Adler, C
    Ball, C
    Chervitz, SA
    Dwight, SS
    Hester, ET
    Jia, YK
    Juvik, G
    Roe, T
    Schroeder, M
    Weng, SA
    Botstein, D
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (01) : 73 - 79
  • [7] Fukuda K, 1998, Pac Symp Biocomput, P707
  • [8] Eco Cyc:: Encyclopedia of Escherichia coli genes and metabolism
    Karp, PD
    Riley, M
    Paley, SM
    Pellegrini-Toole, A
    Krummenacker, M
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 55 - 58
  • [9] Lappin S., 1994, Computational Linguistics, V20, P535
  • [10] MIPS: a database for genomes and protein sequences
    Mewes, HW
    Heumann, K
    Kaps, A
    Mayer, K
    Pfeiffer, F
    Stocker, S
    Frishman, D
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 44 - 48