Query enrichment for Web-query classification

被引：65

作者：

Shen, Dou

Pan, Rong

Sun, Jian-Tao

Pan, Jeffrey Junfeng

Wu, Kangheng

Yin, Jie

Yang, Qiang

机构：

[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2006年 / 24卷 / 03期

关键词：

algorithms; experimentation; performance; query classification; query enrichment; synonym-based classifier; ensemble learning; KDDCUP2005;

D O I：

10.1145/1165774.1165776

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Web-search queries are typically short and ambiguous. To classify these queries into certain target categories is a difficult but important problem. In this article, we present a new technique called query enrichment, which takes a short query and maps it to intermediate objects. Based on the collected intermediate objects, the query is then mapped to target categories. To build the necessary mapping functions, we use an ensemble of search engines to produce an enrichment of the queries. Our technique was applied to the ACM Knowledge Discovery and Data Mining competition (ACM KDDCUP) in 2005, where we won the championship on all three evaluation metrics (aprecision, F1 measure, which combines precision and recall, and creativity, which is judged by the organizers) among a total of 33 teams worldwide. In this article, we show that, despite the difficulty of an abundance of ambiguous queries and lack of training data, our query-enrichment technique can solve the problem satisfactorily through a two-phase classification framework. We present a detailed description of our algorithm and experimental evaluation. Our best result for F1 and precision is 42.4% and 44.4%, respectively, which is 9.6% and 24.3% higher than those from the runner-ups, respectively.

引用

页码：320 / 352

页数：33

共 38 条

[1]

Alpaydin E., 2004, Introduction to Machine Learning, V2nd

[2]

[Anonymous], 2005, SIGKDD EXPLORATIONS

[3]

[Anonymous], 1999, ICML

[4]

[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026

[5] An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].

Bauer, E ;

Kohavi, R .

MACHINE LEARNING, 1999, 36 (1-2) :105-139

[6]

Beeferman D., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P407, DOI 10.1145/347090.347176

[7]

Beitzel S. M., 2005, SIGIR 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P581, DOI 10.1145/1076034.1076138

[8]

Cann A., 2003, MATH SCRATCH BIOL

[9]

Caruana R., 2004, P 21 INT C MACH LEAR, P18, DOI DOI 10.1145/1015330.1015432

[10]

CHANG CH, 1998, WWW7 P 7 INT C WORLD, P621

← 1 2 3 4 →