WEB MINING: A SURVEY OF CURRENT RESEARCH, TECHNIQUES, AND SOFTWARE

被引:44
作者
Zhang, Qingyu [1 ]
Segall, Richard S. [1 ]
机构
[1] Arkansas State Univ, Dept Comp & Informat Technol, State Univ, AR 72467 USA
关键词
Web mining; web content mining; web usage mining; web structure mining; web mining software;
D O I
10.1142/S0219622008003150
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of this paper is to provide a more current evaluation and update of web mining research and techniques available. Current advances in each of the three different types of web mining are reviewed in the categories of web content mining, web usage mining, and web structure mining. For each tabulated research work, we examine such key issues as web mining process, methods/techniques, applications, data sources, and software used. Unlike previous investigators, we divide web mining processes into the following five subtasks: (1) resource finding and retrieving, (2) information selection and preprocessing, (3) patterns analysis and recognition, (4) validation and interpretation, and (5) visualization. This paper also reports the comparisons and summaries of selected software for web mining. The web mining software selected for discussion and comparison in this paper are SPSS Clementine, Megaputer PolyAnalyst, ClickTracks by web analytics, and QL2 by QL2 Software Inc. Applications of these selected web mining software to available data sets are discussed together with abundant presentations of screen shots, as well as conclusions and future directions of the research.
引用
收藏
页码:683 / 720
页数:38
相关论文
共 95 条
[71]  
SAKKOPOULOS B, 2006, INT J METADATA SEMAN, V1, P66
[72]   Guest editor's introduction: Special issue on Web content mining [J].
Scime, A .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2004, 22 (03) :211-213
[73]  
Scime A., 2005, WEB MINING APPL TECH
[74]  
*SEM WEB AGR GROUP, 2001, WHAT IS SEM WEB
[75]   Mining user access patterns with traversal constraint for predicting web page requests [J].
Shyu, Mei-Ling ;
Haruechaiyasak, Choochart ;
Chen, Shu-Ching .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (04) :515-528
[76]   Web page clustering using a self-organizing map of user navigation patterns [J].
Smith, KA ;
Ng, A .
DECISION SUPPORT SYSTEMS, 2003, 35 (02) :245-256
[77]  
Smyth P, 2002, COMMUN ACM, V45, P33, DOI 10.1145/545151.545175
[78]   Mining web browsing patterns for E-commerce [J].
Song, Qinbao ;
Shepperd, Martin .
COMPUTERS IN INDUSTRY, 2006, 57 (07) :622-630
[79]   Web usage mining for Web site evaluation - Making a site better fit its users. [J].
Spiliopoulou, M .
COMMUNICATIONS OF THE ACM, 2000, 43 (08) :127-134
[80]  
*SPSS, 2007, WEB MIN CLEM