A framework for the evaluation of session reconstruction heuristics in web-usage analysis

被引:218
作者
Spiliopoulou, M
Mobasher, B
Berendt, B
Nakagawa, M
机构
[1] Univ Magdeburg, Res Grp Knowledge Management & Discovery, ITI, D-39016 Magdeburg, Germany
[2] Depaul Univ, Sch Comp Sci Telecommun & Informat Syst, Chicago, IL 60604 USA
[3] Humboldt Univ, Inst Informat Syst, D-10178 Berlin, Germany
关键词
web-usage mining; data preparation; reconstruction of web server sessions;
D O I
10.1287/ijoc.15.2.171.14445
中图分类号
TP39 [计算机的应用];
学科分类号
081203 [计算机应用技术]; 0835 [软件工程];
摘要
Web-usage mining has become the subject of intensive research, as its potential for personalized services, adaptive Web sites and customer profiling is recognized. However, the reliability of Web-usage mining results depends heavily on the proper preparation of the input datasets. In particular, errors in the reconstruction of sessions and incomplete tracing of users' activities in a site can easily result in invalid patterns and wrong conclusions. In this study, we evaluate the performance of heuristics employed to reconstruct sessions from the server log data. Such heuristics are called to partition activities first by user and then by visit of the user in the site, where user identification mechanisms, such as cookies, may or may not be available. We propose a set of performance measures that are sensitive to two types of reconstruction errors and appropriate for different applications in knowledge discovery (KDD) applications. We have tested our framework on the Web server data of a frame-based Web site. The first experiment concerned a specific KDD application and has shown the sensitivity of the heuristics to particularities of the site's structure and traffic. The second experiment is not bound to a specific application but rather compares the performance of the heuristics for different measures and thus for different application types. Our results show that there is no single best heuristic, but our measures help the analyst in the selection of the heuristic best suited for the application at hand.
引用
收藏
页码:171 / 190
页数:20
相关论文
共 15 条
[1]
[Anonymous], 2001, WORKSHOP WEB MINING
[2]
Analysis of navigation behaviour in web sites integrating multiple information systems [J].
Berendt, B ;
Spiliopoulou, M .
VLDB JOURNAL, 2000, 9 (01) :56-75
[3]
CHARACTERIZING BROWSING STRATEGIES IN THE WORLD-WIDE-WEB [J].
CATLEDGE, LD ;
PITKOW, JE .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1995, 27 (06) :1065-1073
[4]
Cooley R, 2000, LECT NOTES COMPUT SC, V1836, P163
[5]
Cooley R., 1999, Knowledge and Information Systems, V1, P5
[6]
GEYERSCHULZ A, 2001, P WEBKDD01 WORKSH AC, P35
[7]
Hand D., 2001, ADAP COMP MACH LEARN
[8]
MAYERSCHONBERGE.V, 1997, W VIRGINIA J LAW TEC
[9]
Padmanabhan B., 2001, P 7 ACM SIGKDD INT C, P154
[10]
Pyle D., 1999, Data Preparation for Data Mining