Time series analysis of a Web search engine transaction log

被引:34
作者
Zhang, Ying [2 ]
Jansen, Bernard J. [1 ]
Spink, Amanda [3 ]
机构
[1] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
[2] Penn State Univ, Coll Engn, Harold & Inge Marcus Dept Ind & Mfg Engn, University Pk, PA 16802 USA
[3] Queensland Univ Technol, Fac Informat Technol, Brisbane, Qld 4001, Australia
关键词
ARIMA; Box-Jenkins model; Search engine; Time series analysis; Transactional log; QUERIES; PATTERNS; TRENDS;
D O I
10.1016/j.ipm.2008.07.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we use time series analysis to evaluate predictive scenarios using search engine transactional logs. Our goal is to develop models for the analysis of searchers' behaviors over time and investigate if time series analysis is a valid method for predicting relationships between searcher actions. Time series analysis is a method often used to understand the underlying characteristics of temporal data in order to make forecasts. In this study, we used a Web search engine transactional log and time series analysis to investigate users' actions. We conducted our analysis in two phases. In the initial phase, we employed a basic analysis and found that 10% of searchers clicked on sponsored links. However, from 22:00 to 24:00, searchers almost exclusively clicked on the organic links, with almost no clicks on sponsored links. In the second and more extensive phase, we used a one-step prediction time series analysis method along with a transfer function method. The period rarely affects navigational and transactional queries, while rates for transactional queries vary during different periods. Our results show that the average length of a searcher session is approximately 2.9 interactions and that this average is consistent across time periods. Most importantly, our findings shows that searchers who submit the shortest queries (i.e., in number of terms) click on highest ranked results. We discuss implications, including predictive value, and future research. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:230 / 245
页数:16
相关论文
共 25 条
[1]  
Beitzel S.M., 2004, Hourly analysis of a very large topically categorized web query log, P321
[2]   Automatic classification of Web queries using very large unlabeled query logs [J].
Beitzel, Steven M. ;
Jensen, Eric C. ;
Lewis, David D. ;
Chowdhury, Abdur ;
Frieder, Ophir .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2007, 25 (02)
[3]  
Box G. E. P., 1976, TIME SERIES ANAL
[4]  
CHAU M, 2003, PERSONALIZED FOCUSED, P79
[5]  
CHU M, 2005, J AM SOC INFORM SCI, V56, P1363
[6]   Client-side monitoring for web mining [J].
Fenstermacher, KD ;
Ginsburg, M .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2003, 54 (07) :625-637
[7]  
HECKERMAN D, 1998, INFERRING INFORM GOA, P230
[8]   MODELING PERSISTENCE IN HYDROLOGICAL TIME-SERIES USING FRACTIONAL DIFFERENCING [J].
HOSKING, JRM .
WATER RESOURCES RESEARCH, 1984, 20 (12) :1898-1908
[9]  
Hotchkiss G., 2004, INSIDE MIND SEARCHER
[10]   Search log analysis: What it is, what's been done, how to do it [J].
Jansen, Bemard J. .
LIBRARY & INFORMATION SCIENCE RESEARCH, 2006, 28 (03) :407-432