Dictionary-based cross-language information retrieval:: Learning experiences from CLEF 2000-2002

被引:28
作者
Hedlund, T [1 ]
Airio, E [1 ]
Keskustalo, H [1 ]
Lehtokangas, R [1 ]
Pirkola, A [1 ]
Järvelin, K [1 ]
机构
[1] Univ Tampere, Dept Informat Studies, FIN-33101 Tampere, Finland
来源
INFORMATION RETRIEVAL | 2004年 / 7卷 / 1-2期
关键词
cross-language information retrieval; compound handling; proper name matching; transitive CLIR; UTACLIR query translation system;
D O I
10.1023/B:INRT.0000009442.34054.55
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study the basic framework and performance analysis results are presented for the three year long development process of the dictionary-based UTACLIR system. The tests expand from bilingual CLIR for three language pairs Swedish, Finnish and German to English, to six language pairs, from English to French, German, Spanish, Italian, Dutch and Finnish, and from bilingual to multilingual. In addition, transitive translation tests are reported. The development process of the UTACLIR query translation system will be regarded from the point of view of a learning process. The contribution of the individual components, the effectiveness of compound handling, proper name matching and structuring of queries are analyzed. The results and the fault analysis have been valuable in the development process. Overall the results indicate that the process is robust and can be extended to other languages. The individual effects of the different components are in general positive. However, performance also depends on the topic set and the number of compounds and proper names in the topic, and to some extent on the source and target language. The dictionaries used affect the performance significantly.
引用
收藏
页码:99 / 119
页数:21
相关论文
共 31 条
[1]  
AIRIO E, 2002, CLEF 2002 WORKSH 19, P51
[2]  
[Anonymous], THESIS U TAMPERE
[3]  
Ballesteros L, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P84, DOI 10.1145/278459.258540
[4]  
BALLESTEROS LA, 2000, ADV INFORM RETRIEVAL, P203
[5]  
Davis MW, 1997, PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P92, DOI 10.1145/278459.258542
[6]  
Gachot DA, 1998, KLUW S INF, V2, P105
[7]  
Gollins T., 2001, SIGIR Forum, P90
[8]  
GOLLINS T, 2001, LECT NOTES COMPUTER, V2069, P245
[9]  
Hedlund T, 2002, LECT NOTES COMPUT SC, V2406, P118
[10]   Aspects of Swedish morphology and semantics from the perspective of mono- and cross-language information retrieval [J].
Hedlund, T ;
Pirkola, A ;
Järvelin, K .
INFORMATION PROCESSING & MANAGEMENT, 2001, 37 (01) :147-161