Self organization of a massive text document collection

被引:49
作者
Kohonen, T [1 ]
Kaski, S [1 ]
Lagus, K [1 ]
Salojärvi, J [1 ]
Honkela, J [1 ]
Paatero, V [1 ]
Saarela, A [1 ]
机构
[1] Aalto Univ, Neural Networks Res Ctr, FIN-02015 Helsinki, Finland
来源
KOHONEN MAPS | 1999年
关键词
D O I
10.1016/B978-044450270-4/50013-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When the SORI is applied to the mapping of documents, one can represent them statistically by their weighted word frequency histograms or some reduced representations of the histograms that can be regarded as data vectors. Mie have made such a SOM of about seven million documents, viz. of all of the patent abstracts in the world that have been written in English and are available in electronic form. The map consists of about one million models (nodes). Keywords or key texts can be used to starch fur the most relevant documents first. New effective coding and computational schemes of the mapping are described.
引用
收藏
页码:171 / 182
页数:12
相关论文
共 13 条
[1]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[2]  
2-9
[3]   WEBSOM - Self-organizing maps of document collections [J].
Kaski, S ;
Honkela, T ;
Lagus, K ;
Kohonen, T .
NEUROCOMPUTING, 1998, 21 (1-3) :101-117
[4]  
Kaski S., 1997, THESIS HELSINKI U TE, V82
[5]  
KASKI S, 1998, P INT JOINT C NEUR N, P413
[6]  
Kohonen T., 1997, Self-organizing Maps, V2nd ed.
[7]  
KOHONEN T, 1998, P ICANN98 8 INT C AR, V1, P65
[8]  
KOIKKALAINEN P, 1994, P ECAI 94 11 EUR C A, P211
[9]  
KOIKKALAINEN P, 1995, P ICANN 95 INT C ART, V2, P63
[10]  
LAGUS K, 1999, P ICANN 99 INT C ART