Biomedical knowledge navigation by literature clustering

被引:28
作者
Yamamoto, Yasunori [1 ]
Takagi, Toshihisa [1 ]
机构
[1] Univ Tokyo, Dept Computat Biol, Kashiwa, Chiba 2778561, Japan
关键词
clustering; text-mining; automatic labeling; summarization;
D O I
10.1016/j.jbi.2006.07.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
There is an urgent need for a system that facilitates surveys by biomedical researchers and the subsequent formulation of hypotheses based on the knowledge stored in literature. One approach is to cluster papers discussing a topic of interest and reveal its sub-topics that allow researchers to acquire an overview of the topic. We developed such a system called McSyBi. It accepts a set of citation data retrieved with PubMed and hierarchically and non-hierarchically clusters them based on the titles and the abstracts using statistical and natural language processing methods. A novel point is that McSyBi allows its users to change the clustering by entering a MeSH term or UMLS Semantic Type, and therefore they can see a set of citation data from multiple aspects. We evaluated McSyBi quantitatively and qualitatively: clustering of 27 sets of citation data (40643 different papers) and scrutiny of several resultant clusters. While non-hierarchical clustering provides us with an overview of the target topic, hierarchical clustering allows us to see more details and relationships among citation data. McSyBi is freely available at http://textlens.hgc.jp/McSyBi/. (c) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:114 / 130
页数:17
相关论文
共 51 条
[1]  
Alberts B., 2002, Molecular Biology of The Cell, V4th
[2]  
[Anonymous], INT J DIGITAL LIB
[3]   ALICE: An algorithm to extract abbreviations from MEDLINE [J].
Ao, H ;
Takagi, TI .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2005, 12 (05) :576-586
[4]   Self-organized living systems:: conjunction of a stable organization with chaotic fluctuations in biological space-time [J].
Auffray, C ;
Imbeaud, S ;
Roux-Rouquié, M ;
Hood, L .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2003, 361 (1807) :1125-1139
[5]  
BAEZAYATS RA, 1999, MODERN INFORM RETRIE
[6]  
BERRY MW, 1993, SVDPACKC VERSION 1 0
[7]   Conceptual biology: Unearthing the gems [J].
Blagosklonny, MV ;
Pardee, AB .
NATURE, 2002, 416 (6879) :373-373
[8]   Mining functional information associated with expression arrays [J].
Blaschke C. ;
Oliveros J.C. ;
Valencia A. .
Functional & Integrative Genomics, 2001, 1 (4) :256-268
[9]  
COHEN KB, 2004, NATURAL LANGUAGE PRO
[10]  
CUTTING DR, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P318