Document categorization and query generation on the World Wide Web using WebACE

被引:78
作者
Boley, D [1 ]
Gini, M
Gross, R
Han, EH
Hastings, K
Karypis, G
Kumar, V
Mobasher, B
Moore, J
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
[2] Silicone Graph Inc, Eagan, MN USA
[3] Univ Minnesota, Dept Comp Sci, Minneapolis, MN 55455 USA
[4] USN, Washington, DC 20350 USA
[5] Depaul Univ, Ctr Web Data Min E Commerce, Chicago, IL 60604 USA
[6] Minneapolis Star Tribune, Minneapolis, MN USA
关键词
clustering; divisive partitioning; graph partitioning; principal component analysis; web documents;
D O I
10.1023/A:1006592405320
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsupervised categorization of a set of documents, combined with a process for generating new queries that is used to search for new related documents and for filtering the resulting documents to extract the ones most closely related to the starting set. The document categories are not given a priori. We present the overall architecture and describe two novel algorithms which provide significant improvement over Hierarchical Agglomeration Clustering and AutoClass algorithms and form the basis for the query generation and search component of the agent. We report on the results of our experiments comparing these new algorithms with more traditional clustering algorithms and we show that our algorithms are fast and sacalable.
引用
收藏
页码:365 / 391
页数:27
相关论文
共 36 条
[1]  
Ackerman M, 1997, AI MAG, V18, P47
[2]  
Agrawal R., 1996, Advances in Knowledge Discovery and Data Mining, P307
[3]  
Anderson T.W., 1954, Psychometrika, V19, P1
[4]  
[Anonymous], 1988, SELF ORG ASS MEMORY
[5]  
Armstrong R., 1995, P AAAI SPRING S INF
[6]  
BALABANOVIC M, 1995, J VISUAL COMMUNICATI, V6
[7]  
BERGE LC, 1976, GRAPHS HYPERGRAPHS
[8]  
BERROL S, 1992, PHYSICAL MED REHABIL, V6, P1
[9]   Using linear algebra for intelligent information retrieval [J].
Berry, MW ;
Dumais, ST ;
OBrien, GW .
SIAM REVIEW, 1995, 37 (04) :573-595
[10]  
Boley D.L., 1997, TR97056 U MINN DEP C