Finding the evidence for protein-protein interactions from PubMed abstracts

被引:42
作者
Jang, Hyunchul
Lim, Jaesoo
Lim, Joon-Ho
Park, Soo-Jun
Lee, Kyu-Chul
Park, Seon-Hee
机构
[1] Elect & Telecommun Res Inst, Bioinformat Team, Taejon 305350, South Korea
[2] Chungnam Natl Univ, Dept Comp Engn, Taejon 305764, South Korea
关键词
D O I
10.1093/bioinformatics/btl203
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Protein-protein interactions play critical roles in biological processes, and many biologists try to find or to predict crucial information concerning these interactions. Before verifying interactions in biological laboratory work, validating them from previous research is necessary. Although many efforts have been made to create databases that store verified information in a structured form, much interaction information still remains as unstructured text. As the amount of new publications has increased rapidly, a large amount of research has sought to extract interactions from the text automatically. However, there remain various difficulties associated with the process of applying automatically generated results into manually annotated databases. For interactions that are not found in manually stored databases, researchers attempt to search for abstracts or full papers. Results: As a result of a search for two proteins, PubMed frequently returns hundreds of abstracts. In this paper, a method is introduced that validates protein-protein interactions from PubMed abstracts. A query is generated from two given proteins automatically and abstracts are then collected from PubMed. Following this, target proteins and their synonyms are recognized and their interaction information is extracted from the collection. It was found that 67.37% of the interactions from DIP-PPI corpus were found from the PubMed abstracts and 87.37% of interactions were found from the given full texts.
引用
收藏
页码:E220 / E226
页数:7
相关论文
共 37 条
[1]  
[Anonymous], 1998, GENOME INFORM
[2]   BIND - The Biomolecular Interaction Network Database [J].
Bader, GD ;
Donaldson, I ;
Wolting, C ;
Ouellette, BFF ;
Pawson, T ;
Hogue, CWV .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :242-245
[3]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[4]  
Blaschke C, 2002, IEEE INTELL SYST, V17, P14, DOI 10.1109/MIS.2002.999215
[5]   Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study [J].
Blaschke, C ;
Valencia, A .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2001, 2 (04) :196-206
[6]  
Blaschke C, 1999, Proc Int Conf Intell Syst Mol Biol, P60
[7]   Mining functional information associated with expression arrays [J].
Blaschke C. ;
Oliveros J.C. ;
Valencia A. .
Functional & Integrative Genomics, 2001, 1 (4) :256-268
[8]  
Brill E, 1995, COMPUT LINGUIST, V21, P543
[9]  
BUNESCU R, 2004, J ARTIFICIAL INTELLI, V33, P139
[10]   BioRAT: extracting biological information from full-length papers [J].
Corney, DPA ;
Buxton, BF ;
Langdon, WB ;
Jones, DT .
BIOINFORMATICS, 2004, 20 (17) :3206-3213