FACTA: a text search engine for finding associated biomedical concepts

被引:110
作者
Tsuruoka, Yoshimasa [1 ,2 ]
Tsujii, Jun'ichi [1 ,2 ,3 ]
Ananiadou, Sophia [1 ,2 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[2] Natl Ctr Text Min NaCTeM, Manchester, Lancs, England
[3] Univ Tokyo, Dept Comp Sci, Tokyo 1138654, Japan
基金
英国生物技术与生命科学研究理事会;
关键词
D O I
10.1093/bioinformatics/btn469
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds) appearing in the documents retrieved by the query. The concepts are presented to the user in a tabular format and ranked based on the co-occurrence statistics. Unlike existing systems that provide similar functionality, FACTA pre-indexes not only the words but also the concepts mentioned in the documents, which enables the user to issue a flexible query (e.g. free keywords or Boolean combinations of keywords/concepts) and receive the results immediately even when the number of the documents that match the query is very large. The user can also view snippets from MEDLINE to get textual evidence of associations between the query terms and the concepts. The concept IDs and their names/synonyms for building the indexes were collected from several biomedical databases and thesauri, such as UniProt, BioThesaurus, UMLS, KEGG and DrugBank.
引用
收藏
页码:2559 / 2560
页数:2
相关论文
共 8 条
  • [1] PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites
    Cheng, Dean
    Knox, Craig
    Young, Nelson
    Stothard, Paul
    Damaraju, Sambasivarao
    Wishart, David S.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : W399 - W405
  • [2] Humphreys B. L., 1989, Proceedings: The Thirteenth Annual Symposium on Computer Applications in Medical Care (Cat. No.89TH0286-5), P475
  • [3] Anni 2.0: a multipurpose text-mining tool for the life sciences
    Jelier, Rob
    Schuemie, Martijn J.
    Veldhoven, Antoine
    Dorssers, Lambert C. J.
    Jenster, Guido
    Kors, Jan A.
    [J]. GENOME BIOLOGY, 2008, 9 (06)
  • [4] MedlineR: an open source library in R for Medline literature data mining
    Lin, SM
    McConnell, P
    Johnson, KF
    Shoemaker, J
    [J]. BIOINFORMATICS, 2004, 20 (18) : 3659 - 3661
  • [5] BioThesaurus: a web-based thesaurus of protein and gene names
    Liu, HF
    Hu, ZZ
    Zhang, J
    Wu, C
    [J]. BIOINFORMATICS, 2006, 22 (01) : 103 - 105
  • [6] LitMiner and WikiGene:: identifying problem-related key players of gene regulation using publication abstracts
    Maier, H
    Döhr, S
    Grote, K
    O'Keeffe, S
    Werner, T
    de Angelis, MH
    Schneider, R
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : W779 - W782
  • [7] Update on XplorMed:: a web server for exploring scientific literature
    Perez-Iratxeta, C
    Pérez, AJ
    Bork, P
    Andrade, MA
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3866 - 3868
  • [8] EBIMed - text crunching to gather facts for proteins from Medline
    Rebholz-Schuhmann, Dietrich
    Kirsch, Harald
    Arregui, Miguel
    Gaudan, Sylvain
    Riethoven, Mark
    Stoehr, Peter
    [J]. BIOINFORMATICS, 2007, 23 (02) : E237 - E244