Data Preparation for Mining World Wide Web Browsing Patterns

被引:33
作者
Robert Cooley
Bamshad Mobasher
Jaideep Srivastava
机构
[1] University of Minnesota,Department of Computer Science and Engineering
关键词
Data mining; World Wide Web; association rules; sequential patterns; path analysis;
D O I
10.1007/BF03325089
中图分类号
学科分类号
摘要
The World Wide Web (WWW) continues to grow at an astounding rate in both the sheer volume of traffic and the size and complexity of Web sites. The complexity of tasks such as Web site design, Web server design, and of simply navigating through a Web site have increased along with this growth. An important input to these design tasks is the analysis of how a Web site is being used. Usage analysis includes straightforward statistics, such as page access frequency, as well as more sophisticated forms of analysis, such as finding the common traversal paths through a Web site. Web Usage Mining is the application of data mining techniques to usage logs of large Web data repositories in order to produce results that can be used in the design tasks mentioned above. However, there are several preprocessing tasks that must be performed prior to applying data mining algorithms to the data collected from server logs. This paper presents several data preparation techniques in order to identify unique users and user sessions. Also, a method to divide user sessions into semantically meaningful transactions is defined and successfully tested against two other methods. Transactions identified by the proposed methods are used to discover association rules from real world data using the WEBMINER system [15].
引用
收藏
页码:5 / 32
页数:27
相关论文
empty
未找到相关数据