EBIMed - text crunching to gather facts for proteins from Medline

被引:120
作者
Rebholz-Schuhmann, Dietrich [1 ]
Kirsch, Harald [1 ]
Arregui, Miguel [1 ]
Gaudan, Sylvain [1 ]
Riethoven, Mark [1 ]
Stoehr, Peter [1 ]
机构
[1] EBI, Cambridge CB10 1SD, England
关键词
D O I
10.1093/bioinformatics/btl302
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
To allow efficient and systematic retrieval of statements from Medline we have developed EBIMed, a service that combines document retrieval with co-occurrence-based analysis of Medline abstracts. Upon keyword query, EBIMed retrieves the abstracts from EMBL-EBI's installation of Medline and filters for sentences that contain biomedical terminology maintained in public bioinformatics resources. The extracted sentences and terminology are used to generate an overview table on proteins, Gene Ontology (GO) annotations, drugs and species used in the same biological context. All terms in retrieved abstracts and extracted sentences are linked to their entries in biomedical databases. We assessed the quality of the identification of terms and relations in the retrieved sentences. More than 90% of the protein names found indeed represented a protein. According to the analysis of four protein-protein pairs from the Wnt pathway we estimated that 37% of the statements containing such a pair mentioned a meaningful interaction and clarified the interaction of Dkk with LRP. We conclude that EBIMed improves access to information where proteins and drugs are involved in the same biological process, e.g. statements with GO annotations of proteins, protein-protein interactions and effects of drugs on proteins.
引用
收藏
页码:E237 / E244
页数:8
相关论文
共 30 条
[1]  
ANDRADE MA, 1998, P INT C INTELL SYST, V6, P25
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   PubMatrix: a tool for multiplex literature mining [J].
Becker, KG ;
Hosack, DA ;
Dennis, G ;
Lempicki, RA ;
Bright, TJ ;
Cheadle, C ;
Engel, J .
BMC BIOINFORMATICS, 2003, 4 (1)
[4]  
Behrens J, 2000, J CELL SCI, V113, P911
[5]  
Craven M, 1999, Proc Int Conf Intell Syst Mol Biol, P77
[6]   BioIE: extracting informative sentences from the biomedical literature [J].
Divoli, A ;
Attwood, TK .
BIOINFORMATICS, 2005, 21 (09) :2138-2139
[7]   GoPubMed: Exploring PubMed with the gene ontology [J].
Doms, A ;
Schroeder, M .
NUCLEIC ACIDS RESEARCH, 2005, 33 :W783-W786
[8]  
FRIEDMAN C, 2001, BIOINFORMATICS S1, V17, P74
[9]   A simple approach for protein name identification:: prospects and limits [J].
Fundel, K ;
Güttler, D ;
Zimmer, R ;
Apostolakis, J .
BMC BIOINFORMATICS, 2005, 6 (Suppl 1)
[10]   Protein structures and information extraction from biological texts: The PASTA system [J].
Gaizauskas, R ;
Demetriou, G ;
Artymiuk, PJ ;
Willett, P .
BIOINFORMATICS, 2003, 19 (01) :135-143