Credibility-inspired ranking for blog post retrieval

被引:2
作者
Wouter Weerkamp
Maarten de Rijke
机构
[1] University of Amsterdam,ISLA
来源
Information Retrieval | 2012年 / 15卷
关键词
Credibility; Blog post retrieval; Reranking;
D O I
暂无
中图分类号
学科分类号
摘要
Credibility of information refers to its believability or the believability of its sources. We explore the impact of credibility-inspired indicators on the task of blog post retrieval, following the intuition that more credible blog posts are preferred by searchers. Based on a previously introduced credibility framework for blogs, we define several credibility indicators, and divide them into post-level (e.g., spelling, timeliness, document length) and blog-level (e.g., regularity, expertise, comments) indicators. The retrieval task at hand is precision-oriented, and we hypothesize that the use of credibility-inspired indicators will positively impact precision. We propose to use ideas from the credibility framework in a reranking approach to the blog post retrieval problem: We introduce two simple ways of reranking the top n of an initial run. The first approach, Credibility-inspired reranking, simply reranks the top n of a baseline based on the credibility-inspired score. The second approach, Combined reranking, multiplies the credibility-inspired score of the top n results by their retrieval score, and reranks based on this score. Results show that Credibility-inspired reranking leads to larger improvements over the baseline than Combined reranking, but both approaches are capable of improving over an already strong baseline. For Credibility-inspired reranking the best performance is achieved using a combination of all post-level indicators. Combined reranking works best using the post-level indicators combined with comments and pronouns. The blog-level indicators expertise, regularity, and coherence do not contribute positively to the performance, although analysis shows that they can be useful for certain topics. Additional analysis shows that a relative small value of n (15–25) leads to the best results, and that posts that move up the ranking due to the integration of reranking based on credibility-inspired indicators do indeed appear to be more credible than the ones that go down.
引用
收藏
页码:243 / 277
页数:34
相关论文
共 15 条
[1]  
Chen M.(2010)Using blog content depth and breadth to access and classify blogs International Journal of Business and Information 5 26-45
[2]  
Ohta T.(2009)An effective coherence measure to determine topical consistency in user generated content International Journal on Document Analysis and Recognition 12 185-203
[3]  
He J.(2008)An analysis on document length retrieval trends in language modeling smoothing Information Retrieval Journal 11 109-138
[4]  
Weerkamp W.(2007)Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research Journal of the American Society for Information Science and Technology 58 2078-2091
[5]  
Larson M.(2010)Predicting podcast preference: An analysis framework and its application Journal of the American Society for Information Science and Technology 61 374-391
[6]  
de Rijke M.(2011)Blog feed search with a post index Information Retrieval Journal 14 515-545
[7]  
Losada D. E.(undefined)undefined undefined undefined undefined-undefined
[8]  
Azzopardi L.(undefined)undefined undefined undefined undefined-undefined
[9]  
Metzger M.(undefined)undefined undefined undefined undefined-undefined
[10]  
Tsagkias M.(undefined)undefined undefined undefined undefined-undefined