The next generation of literature analysis: Integration of genomic analysis into text mining

被引:68
作者
Scherf, M [1 ]
Epple, A [1 ]
Werner, T [1 ]
机构
[1] Genomatrix Software GmbH, D-80339 Munich, Germany
关键词
literature/text mining; gene regulation; promoter analysis; integrated analysis;
D O I
10.1093/bib/6.3.287
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Text-mining systems are indispensable tools to reduce the increasing flux of information in scientific literature to topics pertinent to a particular interest in focus. Most of the scientific literature is published as unstructured free text, complicating the development of data processing tools, which rely on structured information. To overcome the problems of free text analysis, structured, hand-curated information derived from literature is integrated in text-mining systems to improve precision and recall. In this paper several text-mining approaches are reviewed and the next step in development of text-mining systems, which is based on a concept of multiple lines of evidence, is described: results from literature analysis are combined with evidence from experiments and genome analysis to improve the accuracy of results and to generate additional knowledge beyond what is known solely from literature.
引用
收藏
页码:287 / 297
页数:11
相关论文
共 49 条
[1]   Automated extraction of information in molecular biology [J].
Andrade, MA ;
Bork, P .
FEBS LETTERS, 2000, 476 (1-2) :12-17
[2]  
[Anonymous], P 5 NLPRS
[3]  
[Anonymous], 2003, MUCH INFORM
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 1998, 26 (01) :38-42
[6]  
Barrett T, 2005, NUCLEIC ACIDS RES, V33, pD562
[7]  
Blaschke Christian, 2002, Brief Bioinform, V3, P154, DOI 10.1093/bib/3.2.154
[8]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[9]  
Boehlk S, 2000, EUR J IMMUNOL, V30, P1102, DOI 10.1002/(SICI)1521-4141(200004)30:4<1102::AID-IMMU1102>3.0.CO
[10]  
2-X