Aspects of Swedish morphology and semantics from the perspective of mono- and cross-language information retrieval

被引:23
作者
Hedlund, T [1 ]
Pirkola, A [1 ]
Järvelin, K [1 ]
机构
[1] Univ Tampere, Dept Informat Studies, FIN-33101 Tampere, Finland
关键词
text retrieval; cross-language information retrieval; Swedish language; natural language processing;
D O I
10.1016/S0306-4573(00)00024-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper analyzes the features of the Swedish language from the viewpoint of mono- and cross-language information retrieval (CLIR), The study was motivated by the fact that Swedish is known poorly from the IR perspective. This paper shows that Swedish has unique features, in particular gender features, the use of fogemorphemes in the formation of compound words, and a high frequency of homographic words. Especially in dictionary-based CLIR, correct word normalization and compound splitting are essential. It was shown in this study, however, that publicly available morphological analysis tools used for normalization and compound splitting have pitfalls that might decrease the effectiveness of TR and CLIR, A comparative study was performed to test the degree of lexical ambiguity in Swedish, Finnish and English. The results suggest that part-of-speech tagging might be useful in Swedish IR due to the high frequency of homographic words. (C) 2000 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:147 / 161
页数:15
相关论文
共 43 条
[11]  
Davis MW, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P92, DOI 10.1145/278459.258542
[12]  
Efthimiadis EN, 1996, ANNU REV INFORM SCI, V31, P121
[13]  
FJELDVIG T, 1988, NORDIC J LINGUISTICS, V11, P33
[14]  
FRAURUD K, 1988, NORD J LINGUIST, V11, P47
[15]  
Grishman R., 1986, COMPUTATIONAL LINGUI
[16]  
Guthrie JA, 1991, P 29 ANN M ASS COMP, P146
[17]   The role of lexicons in natural language processing [J].
Guthrie, L ;
Pustejovsky, J ;
Wilks, Y ;
Slator, BM .
COMMUNICATIONS OF THE ACM, 1996, 39 (01) :63-72
[18]  
HARMAN D, 1991, J AM SOC INFORM SCI, V42, P7, DOI 10.1002/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO
[19]  
2-P
[20]  
HELLBERG S, 1978, MORPHOLOGY PRESENT D