Summarizing Similarities and Differences Among Related Documents

被引:61
作者
Inderjeet Mani
Eric Bloedorn
机构
来源
Information Retrieval | 1999年 / 1卷 / 1-2期
关键词
text summarization; information retrieval; natural language processing;
D O I
10.1023/A:1009930203452
中图分类号
学科分类号
摘要
In many modern information retrieval applications, a common problem which arises is the existence of multiple documents covering similar information, as in the case of multiple news stories about an event or a sequence of events. A particular challenge for text summarization is to be able to summarize the similarities and differences in information content among these documents. The approach described here exploits the results of recent progress in information extraction to represent salient units of text and their relationships. By exploiting meaningful relations between units based on an analysis of text cohesion and the context in which the comparison is desired, the summarizer can pinpoint similarities and differences, and align text segments. In evaluation experiments, these techniques for exploiting cohesion relations result in summaries which (i) help users more quickly complete a retrieval task (ii) result in improved alignment accuracy over baselines, and (iii) improve identification of topic-relevant similarities and differences.
引用
收藏
页码:35 / 67
页数:32
相关论文
共 24 条
[1]  
Alterman R.(1985)A Dictionary Based on Concept Coherence Artificial Intelligence 25 153-86
[2]  
Baxendale P.B.(1958)Man-made index for technical literature: an experiment IBM Journal of Research and Development 2 354-361
[3]  
Cohen J.D.(1995)Hilights: Language-and Domain-Independent Automatic Indexing Terms for Abstracting Journal of the American Society for Information Science 46 162-174
[4]  
Edmundson H.P.(1969)New methods in automatic abstracting Journal of the Association for Computing Machinery 16 264-285
[5]  
Liddy E.R.(1991)Generating Summaries from Event Data The discourse-level Structure of Empirical Abstracts: An Exploratory Study 27 55-81
[6]  
Mann W.C.(1988)Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text Rhetorical Structure Theory: Toward a functional theory of text organization 8 243-281
[7]  
Thompson S.A.(1995)WordNet: A Lexical Database for English Information Processing and Management 31 735-751
[8]  
Maybury M.(1991)“Constructing Literature Abstracts by Computer: Techniques and Prospects Computational Linguistics 17 21-43
[9]  
Morris J.(1995)TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data Communications of the ACM 38 39-41
[10]  
Hirst G.(1990)An Algorithm For Suffix Stripping Information Processing and Management 26 171-186