A Combination Approach to Web User Profiling

被引:115
作者
Tang, Jie [1 ]
Yao, Limin [2 ]
Zhang, Duo [3 ]
Zhang, Jing [1 ]
机构
[1] Tsinghua Univ, Beijing 100084, Peoples R China
[2] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
[3] Univ Illinois, Siebel Ctr Comp Sci 1125, Urbana, IL 61801 USA
关键词
User profiling; information extraction; name disambiguation; topic modeling; social network; text mining; EXTRACTION;
D O I
10.1145/1870096.1870098
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we study the problem of Web user profiling, which is aimed at finding, extracting, and fusing the "semantic"-based user profile from the Web. Previously, Web user profiling was often undertaken by creating a list of keywords for the user, which is (sometimes even highly) insufficient for main applications. This article formalizes the profiling problem as several subtasks: profile extraction, profile integration, and user interest discovery. We propose a combination approach to deal with the profiling tasks. Specifically, we employ a classification model to identify relevant documents for a user from the Web and propose a Tree-Structured Conditional Random Fields (TCRF) to extract the profile information from the identified documents; we propose a unified probabilistic model to deal with the name ambiguity problem (several users with the same name) when integrating the profile information extracted from different sources; finally, we use a probabilistic topic model to model the extracted user profiles, and construct the user interest model. Experimental results on an online system show that the combination approach to different profiling tasks clearly outperforms several baseline methods. The extracted profiles have been applied to expert finding, an important application on the Web. Experiments show that the accuracy of expert finding can be improved (ranging from +6% to +26% in terms of MAP) by taking advantage of the profiles.
引用
收藏
页数:44
相关论文
共 60 条
[1]   Automatic ontology-based knowledge extraction from web documents [J].
Alani, H ;
Kim, S ;
Millard, DE ;
Weal, MJ ;
Hall, W ;
Lewis, PH ;
Shadbolt, NR .
IEEE INTELLIGENT SYSTEMS, 2003, 18 (01) :14-21
[2]   An introduction to MCMC for machine learning [J].
Andrieu, C ;
de Freitas, N ;
Doucet, A ;
Jordan, MI .
MACHINE LEARNING, 2003, 50 (1-2) :5-43
[3]  
[Anonymous], 1971, Markov field on finite graphs and lattices (preprint)
[4]  
[Anonymous], 2007, ACM Transactions on Knowledge Discovery from Data (TKDD), DOI [DOI 10.1145/1217299.1217304, 10.1145/1217299.1217304]
[5]  
[Anonymous], 2005, WWW '05
[6]  
[Anonymous], 653 U CAL DEP STAT
[7]  
[Anonymous], 2004, Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), DOI [10.1145/1014052, DOI 10.1145/1014052]
[8]  
[Anonymous], 2004, P 10 ACM SIGKDD INT, DOI DOI 10.1145/1014052.1014062
[9]  
[Anonymous], 2002, P 40 ANN M ASS COMP
[10]  
[Anonymous], P HLT NAACL