WEB MINING: A SURVEY OF CURRENT RESEARCH, TECHNIQUES, AND SOFTWARE

被引:44
作者
Zhang, Qingyu [1 ]
Segall, Richard S. [1 ]
机构
[1] Arkansas State Univ, Dept Comp & Informat Technol, State Univ, AR 72467 USA
关键词
Web mining; web content mining; web usage mining; web structure mining; web mining software;
D O I
10.1142/S0219622008003150
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this paper is to provide a more current evaluation and update of web mining research and techniques available. Current advances in each of the three different types of web mining are reviewed in the categories of web content mining, web usage mining, and web structure mining. For each tabulated research work, we examine such key issues as web mining process, methods/techniques, applications, data sources, and software used. Unlike previous investigators, we divide web mining processes into the following five subtasks: (1) resource finding and retrieving, (2) information selection and preprocessing, (3) patterns analysis and recognition, (4) validation and interpretation, and (5) visualization. This paper also reports the comparisons and summaries of selected software for web mining. The web mining software selected for discussion and comparison in this paper are SPSS Clementine, Megaputer PolyAnalyst, ClickTracks by web analytics, and QL2 by QL2 Software Inc. Applications of these selected web mining software to available data sets are discussed together with abundant presentations of screen shots, as well as conclusions and future directions of the research.
引用
收藏
页码:683 / 720
页数:38
相关论文
共 95 条
[1]  
ABRAHAM L, 2003, CEC 03, P1384
[2]   Expert-driven validation of rule-based user models in personalization applications [J].
Adomavicius, G ;
Tuzhilin, A .
DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (1-2) :33-58
[3]   Building a cluster of intelligent, adaptive web sites [J].
Amarasiri, R ;
Alahakoon, D .
NEURAL COMPUTING & APPLICATIONS, 2004, 13 (02) :149-156
[4]  
[Anonymous], INT J INF TECHNOL DE
[5]  
BARSAGADE N, 2003, 8331 CSE SO METH U
[6]  
BONCELLA R, 2005, COMMUN ASS INFORM SY, V12, P327
[7]   Classifying web documents in a hierarchy of categories: a comprehensive study [J].
Ceci, Michelangelo ;
Malerba, Donato .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2007, 28 (01) :37-78
[8]  
Chakrabarti S., 2003, Mining the Web: Discovering knowledge from hypertext data
[9]   Design and evaluation of a multi-agent collaborative Web mining system [J].
Chau, M ;
Zeng, D ;
Chen, HC ;
Huang, M ;
Hendriawan, D .
DECISION SUPPORT SYSTEMS, 2003, 35 (01) :167-183
[10]  
Chau R, 2004, LECT NOTES COMPUT SC, V3213, P155