Adversarial Web search

被引:49
作者
Castillo C. [1 ]
Davison B.D. [2 ]
机构
[1] Yahoo Research, Barcelona 08018, Catalunya
[2] Lehigh University, Bethlehem, PA 18015
来源
Foundations and Trends in Information Retrieval | 2010年 / 4卷 / 05期
关键词
Compendex;
D O I
10.1561/1500000021
中图分类号
学科分类号
摘要
Web search engines have become indispensable tools for finding content. As the popularity of the Web has increased, the efforts to exploit the Web for commercial, social, or political advantage have grown, making it harder for search engines to discriminate between truthful signals of content quality and deceptive attempts to game search engines' rankings. This problem is further complicated by the open nature of the Web, which allows anyone to write and publish anything, and by the fact that search engines must analyze ever-growing numbers of Web pages. Moreover, increasing expectations of users, who over time rely on Web search for information needs related to more aspects of their lives, further deepen the need for search engines to develop effective counter-measures against deception. In this monograph, we consider the effects of the adversarial relationship between search systems and those who wish to manipulate them, a field known as "Adversarial Information Retrieval". We show that search engine spammers create false content and misleading links to lure unsuspecting visitors to pages filled with advertisements or malware. We also examine work over the past decade or so that aims to discover such spamming activities to get spam pages removed or their effect on the quality of the results reduced. Research in Adversarial Information Retrieval has been evolving over time, and currently continues both in traditional areas (e.g., link spam) and newer areas, such as click fraud and spam in social media, demonstrating that this conflict is far from over. © 2011 C. Castillo and B. D. Davison.
引用
收藏
页码:377 / 486
页数:109
相关论文
共 209 条
[1]  
Benczur A.A., Csalogany K., Sarlos T., Link-based similarity search to fight Web spam, Proceedings of the Second International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), (2006)
[2]  
Abernethy J., Chapelle O., Semi-supervised classification with hyperlinks, Proceedings of the ECML/PKDD Graph Labeling Workshop, (2007)
[3]  
Abernethy J., Chapelle O., Castillo C., Webspam identification through content and hyperlinks, Proceedings of the Fourth International Workshop on Adversarial Information Retrieval on the Web (AIRWEB), pp. 41-44, (2008)
[4]  
Abernethy J., Chapelle O., Castillo C., Graph regularization methods for web spam detection, Machine Learning Journal, 81, 2, pp. 207-225, (2010)
[5]  
Adali S., Liu T., Magdon-Ismail M., Optimal link bombs are uncoordinated, Proceedings of the First International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), (2005)
[6]  
Adler B.T., De Alfaro L., A content-driven reputation system for the Wikipedia, Proceedings of the 16th International Conference on World Wide Web (WWW), pp. 261-270, (2007)
[7]  
Amitay E., Yogev S., Yom-Tov E., Serial sharers: Detecting split identities of Web authors, Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection, (2007)
[8]  
Andersen R., Borgs C., Chayes J., Hopcraft J., Mirrokni V., Teng S.-H., Local computation of Page Rank contributions, Algorithms and Models for the Web-Graph, Vol. 4863 of Lecture Notes in Computer Science, pp. 150-165, (2007)
[9]  
Arasu A., Cho J., Garcia-Molina H., Paepcke A., Raghavan S., Searching the web, ACM Transactions on the Internet Technology (TOIT) 1, 1, pp. 2-43, (2001)
[10]  
Attenberg J., Suel T., Cleaning search results using term distance features, Proceedings of the Fourth International Workshop on Adversarial Information Retrieval on the Web (AIR Web), pp. 21-24, (2008)