基于半监督话题模型的用户查询日志命名实体挖掘

被引:6
作者
曹雷 [1 ,2 ]
郭嘉丰 [1 ]
白露 [1 ,2 ]
程学旗 [1 ]
机构
[1] 中国科学院计算技术研究所网络数据科学与工程研究中心
[2] 中国科学院研究生院
关键词
用户查询日志; 命名实体挖掘; 半监督话题模型;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
基于用户查询日志的命名实体挖掘,目标是从用户查询日志中挖掘具有指定类别的命名实体。已有研究工作提出一种基于种子实体的挖掘方法,利用实体类别与候选实体之间的模板分布相似性来对候选实体进行排序。然而该挖掘方法忽略了命名实体具有歧义性、查询模板具有多义性和未标注实体信息,因而不能够有效的对候选实体进行排序。该文采用半监督话题模型,利用查询模板之间的关系来学习实体类别的模板分布,进而改善候选实体的排序效果。实验结果表明了该文提出方法的有效性。
引用
收藏
页码:26 / 32
页数:7
相关论文
共 10 条
[1]  
Namedentity recognition in query. Jiafeng Guo,Gu Xu,Xueqi Cheng,et al. Proceedings of the32nd International ACM SIGIR Conference onResearch and Development in Information Retrieval . 2009
[2]  
A cross-collection mixture model for comparative text mining. Cheng Xiang Zhai,Atulya Velivelli,Bei Yu. Proceedings of the 10th ACM SIGKDD interna-tional conference on Knowledge discovery and datamining,KDD’’04 . 2004
[3]  
Named entitymining from click-through data using weaklysupervised latent dirichlet allocation. Gu Xu,Shuang-Hong Yang,Hang Li. Proceedingsof the 15th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining . 2009
[4]  
Usingsearch session context for named entity recognition inquery. Junwu Du,Zhimin Zhang,Jun Yan,et al. Proceeding of the 33rd international ACMSIGIR Conference on Research and Development inInformation Retrieval . 2010
[5]  
Supervised topicmodels. David M.Blei,Jon D.McAuliffe. Proceedings of the 21st AnnualConference on Neural Information Processing Systems . 2007
[6]  
Regularized estimationof mixture models for robust pseudo-relevancefeedback. Tao Tao,ChengXiang Zhai. Proceedings of the 29th AnnualInternational ACM SIGIR Conference on Researchand Development in Information Retrieval . 2006
[7]  
Probabilistic latent semantic indexing. Thomas Hofmann. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval . 1999
[8]   基于用户查询日志的命名实体挖掘 [J].
翟海军 ;
郭嘉丰 ;
王小磊 ;
许洪波 .
中文信息学报, 2010, 24 (01) :71-76+116
[9]  
Weakly-supervised discovery of namedentities using Web search queries. Marius Pa?ca. Proceedings ofthe 16th ACM Conference on Information andKnowledge Management . 2007
[10]  
Opinion integration through semi-supervised topic modeling. Y. Lu,C. Zhai. Proceedings of the 17th international conference on World Wide Web (WWW’’08) . 2008