Truth discovery with multiple conflicting information providers on the Web

被引:365
作者
Yin, Xiaoxin [1 ]
Han, Jiawei [2 ]
Yu, Philip S. [3 ]
机构
[1] Microsoft Corp, Microsoft Res, Redmond, WA 98052 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[3] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
美国国家科学基金会;
关键词
data quality; Web mining; link analysis;
D O I
10.1109/TKDE.2007.190745
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The World Wide Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject, such as different specifications for the same product. In this paper, we propose a new problem, called Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various websites. We design a general framework for the Veracity problem and invent an algorithm, called TRUTHFINDER, which utilizes the relationships between websites and their information, i.e., a website is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy websites. An iterative method is used to infer the trustworthiness of websites and the correctness of information from each other. Our experiments show that TRUTHFINDER successfully finds true facts among conflicting information and identifies trustworthy websites better than the popular search engines.
引用
收藏
页码:796 / 808
页数:13
相关论文
共 15 条
[1]  
AMENTO B, 2000, P ACM SIGIR 00 JUL
[2]  
[Anonymous], 1998, EMPIRICAL ANAL PREDI
[3]  
BLAZE M, 1996, P IEEE S SEC PRIV IS
[4]  
Borodin A., 2005, ACM Transactions on Internet Technology, V5, P231, DOI 10.1145/1052934.1052942
[5]  
GUHA R., 2004, P 13 INT C WORLD WID
[6]  
JEH G, 2002, P ACM SIGKDD 02 JUL
[7]   Authoritative sources in a hyperlinked environment [J].
Kleinberg, JM .
JOURNAL OF THE ACM, 1999, 46 (05) :604-632
[8]  
MANDL T, 2006, P 17 ACM C HYP HYP A
[9]  
Motwani R., 1998, TECHNICAL REPORT
[10]  
*PRINC SURV RES AS, 2005, RES NATL SURV INT US