Improving the Extraction of Bilingual Terminology from Wikipedia

被引:29
作者
Erdmann, Maike [1 ]
Nakayama, Kotaro [2 ]
Hara, Takahiro [1 ]
Nishio, Shojiro [1 ]
机构
[1] Osaka Univ, Suita, Osaka 565, Japan
[2] Univ Tokyo, Tokyo 1138654, Japan
关键词
Algorithms; Experimentation; Bilingual dictionary; Wikipedia mining; link analysis;
D O I
10.1145/1596990.1596995
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research on the automatic construction of bilingual dictionaries has achieved impressive results. Bilingual dictionaries are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. In this article, we want to further pursue the idea of using Wikipedia as a corpus for bilingual terminology extraction. We propose a method that extracts term-translation pairs from different types of Wikipedia link information. After that, an SVM classifier trained on the features of manually labeled training data determines the correctness of unseen term-translation pairs.
引用
收藏
页数:17
相关论文
共 24 条
[1]  
[Anonymous], P ANN M ASS COMP LIN
[2]  
[Anonymous], P C EUR CHAPT ASS CO
[3]  
[Anonymous], P INT C LANG RES EV
[4]  
[Anonymous], FEATURE EXTRACTION F
[5]  
[Anonymous], P ANN C INT SOC INET
[6]  
[Anonymous], P INT C DAT SYST ADV
[7]  
[Anonymous], P C COMP LING CL
[8]  
[Anonymous], P C EUR CHAPT ASS CO
[9]  
[Anonymous], P MACH TRANSL SUMM X
[10]  
[Anonymous], P WORKSH CROSS LANG