BEYOND RANKED LISTS IN WEB SEARCH: AGGREGATING WEB CONTENT INTO TOPIC PAGES

被引:2
作者
Balasubramanian, Niranjan [1 ]
Cucerzan, Silviu [2 ]
机构
[1] Univ Massachusetts, 140 Governors Dr, Amherst, MA 01003 USA
[2] Microsoft Res, Redmond, WA 98052 USA
关键词
Web search; topic page; query log; aspect model;
D O I
10.1142/S1793351X10001103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the automatic generation of topic pages as an alternative to the current Web search paradigm. Topic pages explicitly aggregate information across documents, filter redundancy, and promote diversity of topical aspects. We propose a novel framework for building rich topical aspect models and selecting diverse information from the Web. In particular, we useWeb search logs to build aspect models with various degrees of specificity, and then employ these aspect models as input to a sentence selection method that identifies relevant and non-redundant sentences from the Web. Automatic and manual evaluations on biographical topics show that topic pages built by our system compare favorably to regular Web search results and to MDS-style summaries of the Web results on all metrics employed.
引用
收藏
页码:509 / 534
页数:26
相关论文
共 28 条
  • [1] Automatic ontology-based knowledge extraction from web documents
    Alani, H
    Kim, S
    Millard, DE
    Weal, MJ
    Hall, W
    Lewis, PH
    Shadbolt, NR
    [J]. IEEE INTELLIGENT SYSTEMS, 2003, 18 (01) : 14 - 21
  • [2] Allan J., 2003, P 26 ANN INT ACM SIG, P314, DOI DOI 10.1145/860435.860493
  • [3] [Anonymous], 2001, UND C
  • [4] BIADSY F, 2008, P 46 ANN M ASS COMP, P807
  • [5] Cheng P. J., 2006, P CIKM 2006, P862
  • [6] The influence of caption features on clickthrough patterns in web search
    Clarke, Charles L. A.
    Agichtein, Eugene
    Dumais, Susan
    White, Ryen W.
    [J]. Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'07, 2007, : 135 - 142
  • [7] Cucerzan S., 2007, JOINT C EMP METH NAT, P708, DOI DOI 10.1145/2187836.2187900
  • [8] Dang Hoa Trang, 2005, P DUC 2005
  • [9] Daume III Hal, 2004, P ANN C N AM CHAPTER, P49
  • [10] LexRank: Graph-based lexical centrality as salience in text summarization
    Erkan, G
    Radev, DR
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2004, 22 : 457 - 479