Exploiting Neighborhood Knowledge for Single Document Summarization and Keyphrase Extraction

被引:171
作者
Wan, Xiaojun [1 ,2 ]
Xiao, Jianguo [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100817, Peoples R China
[2] Peking Univ, Key Lab Computat Linguist, MOE, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Algorithms; Experimentation; Document summarization; keyphrase extraction; neighborhood knowledge; graph-based ranking;
D O I
10.1145/1740592.1740596
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase extraction usually make use of only the information contained in the specified document. This article proposes using a small number of nearest neighbor documents to improve document summarization and keyphrase extraction for the specified document, under the assumption that the neighbor documents could provide additional knowledge and more clues. The specified document is expanded to a small document set by adding a few neighbor documents close to the document, and the graph-based ranking algorithm is then applied on the expanded document set to make use of both the local information in the specified document and the global information in the neighbor documents. Experimental results on the Document Understanding Conference (DUC) benchmark datasets demonstrate the effectiveness and robustness of our proposed approaches. The cross-document sentence relationships in the expanded document set are validated to be beneficial to single document summarization, and the word cooccurrence relationships in the neighbor documents are validated to be very helpful to single document keyphrase extraction.
引用
收藏
页数:34
相关论文
共 74 条
[61]   Enriching the knowledge sources used in a maximum entropy part-of-speech tagger [J].
Toutanova, K ;
Manning, CD .
PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, :63-70
[62]   Learning algorithms for keyphrase extraction [J].
Turney P.D. .
Information Retrieval, 2000, 2 (4) :303-336
[63]  
Turney Peter., 2003, P 18 INT JOINT C ART, P434
[64]  
Wan X., 2008, P 22 INT C COMP LING, P969
[65]  
Wan Xiaojun, 2008, P 23 NAT C ART INT, P855
[66]  
Wan Xiaojun., 2007, Proceedings of the 22nd National Conference on Artificial intelligence - Volume 1, AAAI'07, P931
[67]  
WANG X, 2004, P 13 ACM INT C INF K, P242
[68]  
Witten Ian H., 1999, P 4 ACM C DIGITAL LI, P254
[69]  
Wong TL, 2006, SIAM PROC S, P442
[70]  
Xiaojun Wan, 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P143