Using corpus-based approaches in a system for multilingual information retrieval

被引:15
作者
Braschler, M [1 ]
Schäuble, P [1 ]
机构
[1] Eurospider Informat Technol AG, CH-8006 Zurich, Switzerland
来源
INFORMATION RETRIEVAL | 2000年 / 3卷 / 03期
关键词
multilingual information retrieval; cross-language information retrieval; corpus-based approaches; document alignments;
D O I
10.1023/A:1026525127581
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a system for multilingual information retrieval that allows users to formulate queries in their preferred language and retrieve relevant information from a collection containing documents in multiple languages. The system is based on a process of document level alignments, where documents of different languages are paired according to their similarity. The resulting mapping allows us to produce a multilingual comparable corpus. Such a corpus has multiple interesting applications. It allows us to build a data structure for query translation in cross-language information retrieval (CLIR). Moreover, we also perform pseudo relevance feedback on the alignments to improve our retrieval results. And finally, multiple retrieval runs can be merged into one unified result list. The resulting system is inexpensive, adaptable to domain-specific collections and new languages and has performed very well at the TREC-7 conference CLIR system comparison.
引用
收藏
页码:273 / 284
页数:12
相关论文
共 19 条
[1]  
[Anonymous], 1996, P 19 ANN INT ACM SIG, DOI DOI 10.1145/243199.243202
[2]  
BALLESTEROS L, 1997, P 20 ANN INT ACM SIG
[3]  
Braschler M., 1998, Research and Advanced Technology for Digital Libraries. Second European Conference, ECDL'98. Proceedings, P183
[4]  
BRASCHLER M, 1999, P 7 TEXT RETR C, P509
[5]  
BRASCHLER M, 2000, P 2 INT C LANG RES E
[6]  
Carbonell J.G., 1997, P INT JOINT C ART IN
[7]  
Franz M., 1999, Seventh Text REtrieval Conference (TREC-7) (NIST SP 500-242), P157
[8]  
FUNG P, 1997, 5 ANN WORKSH VER LAR
[9]  
Gale W. A., 1993, Computational Linguistics, V19, P75
[10]  
HARMAN D, 1992, INFORMATION RETRIEVA