The use of domain-specific concepts in biomedical text summarization

被引:77
作者
Reeve, Lawrence H.
Han, Hyoil [1 ]
Brooks, Ari D.
机构
[1] Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USA
[2] Drexel Univ, Coll Med, Philadelphia, PA 19104 USA
关键词
text summarization; biomedicine; concept chaining; concept frequency;
D O I
10.1016/j.ipm.2007.01.026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text summarization is a method for data reduction. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. This paper presents two independent methods (BioChain and FreqDist) for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources. Our semantic-based method (BioChain) is effective at identifying thematic sentences, while our frequency-distribution method (FreqDist) removes information redundancy. The two methods are then combined to form a hybrid method (ChainFreq). An evaluation of each method is performed using the ROUGE system to compare system-generated summaries against a set of manually-generated summaries. The BioChain and FreqDist methods outperform some common summarization systems, while the ChainFreq method improves upon the base approaches. Our work shows that the best performance is achieved when the two methods are combined. The paper also presents a brief physician's evaluation of three randomly-selected papers from an evaluation corpus to show that the author's abstract does not always reflect the entire contents of the full-text. (C) 2007 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1765 / 1776
页数:12
相关论文
共 40 条
[1]  
Baeza-Yates R.A., 1999, Modern Information Retrieval
[2]  
Barzilay R., 1997, Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization, P10
[3]  
BROOKS AD, 2002, SURG ONCOLOGY CLIN N, P3
[4]  
Carbonell J., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P335, DOI 10.1145/290941.291025
[5]   A survey of current work in biomedical text mining [J].
Cohen, AM ;
Hersh, WR .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) :57-71
[6]  
DALIANIS H, 2000, SWESUM TEXT SUMMARIZ
[7]  
DAVANZO E, 2004, P 2004 DOC UND C BOS
[8]   MEASURES OF THE AMOUNT OF ECOLOGIC ASSOCIATION BETWEEN SPECIES [J].
DICE, LR .
ECOLOGY, 1945, 26 (03) :297-302
[9]  
Doran W, 2004, LECT NOTES COMPUT SC, V2945, P627
[10]  
Edmundson HP, 1999, ADVANCES IN AUTOMATIC TEXT SUMMARIZATION, P23