Linguistic feature extraction using independent component analysis

被引:8
作者
Honkela, T [1 ]
Hyvärinen, A [1 ]
机构
[1] Aalto Univ, Neural Networks Res Ctr, Lab Comp & Informat Sci, Helsinki, Finland
来源
2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS | 2004年
关键词
D O I
10.1109/IJCNN.2004.1379914
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our aim is to find syntactic and semantic relationships of words based on the analysis of corpora. We propose the application of independent component analysis, which seems to have clear advantages over two classic methods: latent semantic analysis and self-organizing maps. Latent semantic analysis is a simple method for automatic generation of concepts that are useful, e.g., in encoding documents for information retrieval purposes. However, these concepts cannot easily be interpreted by humans. Self-organizing maps can be used to generate an explicit diagram which characterizes the relationships between words. The resulting map reflects syntactic categories in the overall organization and semantic categories in the local level. The self-organizing map does not, however, provide any explicit distinct categories for the words. Independent component analysis applied on word context data gives distinct features which reflect syntactic and semantic categories. Thus, independent component analysis gives features or categories that are both explicit and can easily be interpreted by humans. This result can be obtained without any human supervision or tagged corpora that would have some predetermined morphological, syntactic or semantic information.
引用
收藏
页码:279 / 284
页数:6
相关论文
共 24 条
[1]  
[Anonymous], P ICANN 1995 PAR EC2
[2]  
Bingham E., 2002, Proceedings of SIGIR 2002. Twenty-Fifth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P361
[3]   INDEPENDENT COMPONENT ANALYSIS, A NEW CONCEPT [J].
COMON, P .
SIGNAL PROCESSING, 1994, 36 (03) :287-314
[4]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[5]  
2-9
[6]  
Fillmore C.J., 1968, Universals linguistic theory, P1, DOI DOI 10.4236/ENG
[7]  
FINCH S, 1992, ARTIFICIAL NEURAL NETWORKS, 2, VOLS 1 AND 2, P1365
[8]   THE VOCABULARY PROBLEM IN HUMAN SYSTEM COMMUNICATION [J].
FURNAS, GW ;
LANDAUER, TK ;
GOMEZ, LM ;
DUMAIS, ST .
COMMUNICATIONS OF THE ACM, 1987, 30 (11) :964-971
[9]  
Hanks P., 1990, Word association norms, mutual information, and lexicography, V16, P22
[10]  
HONKELA T, 1997, COMPUTING ANTICIPATO, P563