Learning with Click Graph for Query Intent Classification

被引:18
作者
Li, Xiao [1 ]
Wang, Ye-Yi [1 ]
Shen, Dou [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
Algorithms; Experimentation; Semisupervised learning; query classification; user intent; click graph;
D O I
10.1145/1777432.1777435
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Topical query classification, as one step toward understanding users' search intent, is gaining increasing attention in information retrieval. Previous works on this subject primarily focused on enrichment of query features, for example, by augmenting queries with search engine results. In this work, we investigate a completely orthogonal approach-instead of improving feature representation, we aim at drastically increasing the amount of training data. To this end, we propose two semisupervised learning methods that exploit user click-through data. In one approach, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph; and then use these automatically labeled queries to train classifiers using query terms as features. In a second approach, click graph learning and query classifier training are conducted jointly with an integrated objective. Our methods are evaluated in two applications, product intent and job intent classification. In both cases, we expand the training data by over two orders of magnitude, leading to significant improvements in classification performance. An additional finding is that with a large amount of training data obtained in this fashion, a classifier based on simple query term features can outperform those using state-of-the-art, augmented features.
引用
收藏
页数:20
相关论文
共 25 条
[1]  
Agichtein E., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P19, DOI 10.1145/1148170.1148177
[2]  
[Anonymous], P 29 ANN INT ACM SIG
[3]  
[Anonymous], 1993, 31 ANN M ASS COMPUTA, DOI [10.3115/981574.981598, DOI 10.3115/981574.981598]
[4]  
[Anonymous], 2001, Advances in neural information processing systems
[5]  
[Anonymous], 2008, P 31 ANN INT ACM SIG, DOI DOI 10.1145/1390334.1390393
[6]  
Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
[7]  
Beeferman D., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P407, DOI 10.1145/347090.347176
[8]  
Beitzel S., 2005, P 5 IEEE INT C DAT M
[9]  
Beitzel Steven M., 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P783, DOI 10.1145/1277741.1277907
[10]  
Belkin M, 2006, J MACH LEARN RES, V7, P2399