Improving self-organization of document collections by semantic mapping

被引:11
作者
Correa, Renato Fernandes
Ludermir, Teresa Bernarda
机构
[1] Univ Fed Pernambuco, Ctr Informat, BR-50732970 Recife, PE, Brazil
[2] Univ Pernambuco, Polytech Sch, BR-50750410 Recife, PE, Brazil
关键词
dimensionality reduction; semantic mapping; sparse random mapping; self-organizing map; document organization;
D O I
10.1016/j.neucom.2006.07.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In text management tasks, the dimensionality reduction becomes necessary to computation and interpretability of the results generated by machine learning algorithms. This paper describes a feature extraction method called semantic mapping. Semantic mapping, sparse random mapping and PCA are applied to self-organization of document collections using self-organizing map (SOM). The behaviors of the methods on projection of binary and tfidf document vector representations are compared. The classification error generated by SOM maps on text categorization of the K I collection was used to compare the performance of the methods. Semantic mapping generated better document representation than sparse random mapping. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:62 / 69
页数:8
相关论文
共 25 条
[1]  
BINGHAM E, 2002, P SIGIR 02 TAMP FINL
[2]   Partitioning-based clustering for Web document categorization [J].
Boley, D ;
Gini, M ;
Gross, R ;
Han, EH ;
Hastings, K ;
Karypis, G ;
Kumar, V ;
Mobasher, B ;
Moore, J .
DECISION SUPPORT SYSTEMS, 1999, 27 (03) :329-341
[3]   Internet categorization and search: A self-organizing approach [J].
Chen, HC ;
Schuffels, C ;
Orwig, R .
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 1996, 7 (01) :88-102
[4]  
CORREA RF, 2004, P 8 BRAZ S NEUR NETW, V1
[5]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[6]  
2-9
[7]  
Forsythe G., 1977, Computer Methods for Mathematical Computations
[8]   Analysis of a complex of statistical variables into principal components [J].
Hotelling, H .
JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1933, 24 :417-441
[9]  
Kaski S, 1998, IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, P413, DOI 10.1109/IJCNN.1998.682302
[10]   Self organization of a massive document collection [J].
Kohonen, T ;
Kaski, S ;
Lagus, K ;
Salojärvi, J ;
Honkela, J ;
Paatero, V ;
Saarela, A .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (03) :574-585