Exploiting Neighborhood Knowledge for Single Document Summarization and Keyphrase Extraction

被引:171
作者
Wan, Xiaojun [1 ,2 ]
Xiao, Jianguo [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100817, Peoples R China
[2] Peking Univ, Key Lab Computat Linguist, MOE, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Algorithms; Experimentation; Document summarization; keyphrase extraction; neighborhood knowledge; graph-based ranking;
D O I
10.1145/1740592.1740596
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase extraction usually make use of only the information contained in the specified document. This article proposes using a small number of nearest neighbor documents to improve document summarization and keyphrase extraction for the specified document, under the assumption that the neighbor documents could provide additional knowledge and more clues. The specified document is expanded to a small document set by adding a few neighbor documents close to the document, and the graph-based ranking algorithm is then applied on the expanded document set to make use of both the local information in the specified document and the global information in the neighbor documents. Experimental results on the Document Understanding Conference (DUC) benchmark datasets demonstrate the effectiveness and robustness of our proposed approaches. The cross-document sentence relationships in the expanded document set are validated to be beneficial to single document summarization, and the word cooccurrence relationships in the neighbor documents are validated to be very helpful to single document keyphrase extraction.
引用
收藏
页数:34
相关论文
共 74 条
[1]  
Amini M.-R., 2002, Proceedings of SIGIR 2002. Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P105
[2]  
[Anonymous], P 45 ANN M ASS COMP
[3]  
[Anonymous], 2004, P 27 ANN INT ACM SIG, DOI DOI 10.1145/1008992.1009035
[4]  
[Anonymous], 2005, P 28 ANN INT ACM SIG, DOI DOI 10.1145/1076034
[5]  
[Anonymous], P 18 C COMP LING ACL
[6]  
[Anonymous], P 2003 C N AM CHAPT
[7]  
[Anonymous], P SIGIR 05 C
[8]  
[Anonymous], P 24 ANN INT ACM SIG
[9]  
[Anonymous], 2005, P 2 INT JOINT C COMP
[10]  
[Anonymous], MODERN INFORM RETRIV