A text mining approach for automatic construction of hypertexts

被引:22
作者
Yang, HC [1 ]
Lee, CH
机构
[1] Chang Jung Univ, Dept Informat Management, Tainan 711, Taiwan
[2] Natl Kaohsiung Univ Appl Sci, Dept Elect Engn, Kaohsiung, Taiwan
关键词
automatic hypertext construction; self-organizing maps; text mining;
D O I
10.1016/j.eswa.2005.05.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The research on automatic hypertext construction emerges rapidly in the last decade because there exists a urgent need to translate the gigantic amount of legacy documents into web pages. Unlike traditional 'flat' texts, a hypertext contains a number of navigational hyperlinks that point to some related hypertexts or locations of the same hypertext. Traditionally, these hyperlinks were constructed by the creators of the web pages with or without the help of some authoring tools. However, the gigantic amount of documents produced each day prevent from such manual construction. Thus an automatic hypertext construction method is necessary for content providers to efficiently produce adequate information that can be used by web surfers. Although most of the web pages contain a number of non-textual data such as images, sounds, and video clips, text data still contribute the major part of information about the pages. Therefore, it is not surprising that most of automatic hypertext construction methods inherit from traditional information retrieval research. In this work, we will propose a new automatic hypertext construction method based on a text mining approach. Our method applies the self-organizing map algorithm to cluster some at text documents in a training corpus and generate two maps. We then use these maps to identify the sources and destinations of some important hyperlinks within these training documents. The constructed hyperlinks are then inserted into the training documents to translate them into hypertext form. Such translated documents will form the new corpus. Incoming documents can also be translated into hypertext form and added to the corpus through the same approach. Our method had been tested on a set of at text documents collected from a newswire site. Although we only use Chinese text documents, our approach can be applied to any documents that can be transformed to a set of index terms. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:723 / 734
页数:12
相关论文
共 35 条
[1]   On the use of information retrieval techniques for the automatic construction of hypertext [J].
Agosti, M ;
Crestani, F ;
Melucci, M .
INFORMATION PROCESSING & MANAGEMENT, 1997, 33 (02) :133-144
[2]   Design and implementation of a tool for the automatic construction of hypertexts for information retrieval [J].
Agosti, M ;
Crestani, F ;
Melucci, M .
INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (04) :459-476
[3]  
AGOSTI M, 1993, P ACM S APPL COMP IN, P745
[4]   Building hypertext using information retrieval [J].
Allan, J .
INFORMATION PROCESSING & MANAGEMENT, 1997, 33 (02) :145-159
[5]  
ALLAN J, 1996, P 7 ACM C HYP, P42
[6]   INCREMENTAL CLUSTERING FOR DYNAMIC INFORMATION-PROCESSING [J].
CAN, F .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1993, 11 (02) :143-164
[7]  
COX TF, 1994, MULITDIMENSIONAL SCA
[8]   Automatic construction of hypertexts for self-referencing: the Hyper-TextBook project [J].
Crestani, F ;
Melucci, M .
INFORMATION SYSTEMS, 2003, 28 (07) :769-790
[9]  
DALAMAGAS T, 1997, P HYP INF RETR MULT, P265
[10]  
DALAMAGAS T, 1998, P 20 BCS IRSG COLL I