Bilingual terminology extraction using multi-level termhood

被引:7
作者
Zhang, Chengzhi [1 ]
Wu, Dan [2 ]
机构
[1] Nanjing Univ Sci & Technol, Dept Informat Management, Nanjing, Jiangsu, Peoples R China
[2] Wuhan Univ, Sch Informat Management, Wuhan 430072, Peoples R China
关键词
Bilingual terminology extraction; Multi-level termhood; Corpus comparison; Bilingual terminology alignment; Languages; Translation services;
D O I
10.1108/02640471211221395
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Purpose Terminology is the set of technical words or expressions used in specific contexts, which denotes the core concept in a formal discipline and is usually applied in the fields of machine translation, information retrieval, information extraction and text categorization, etc. Bilingual terminology extraction plays an important role in the application of bilingual dictionary compilation, bilingual ontology construction, machine translation and cross-language information retrieval etc. This paper aims to address the issues of monolingual terminology extraction and bilingual term alignment based on multi-level termhood. Design/methodology/approach A method based on multi-level termhood is proposed. The new method computes the termhood of the terminology candidate as well as the sentence that includes the terminology by the comparison of the corpus. Since terminologies and general words usually have different distribution in the corpus, termhood can also be used to constrain and enhance the performance of term alignment when aligning bilingual terms on the parallel corpus. In this paper, bilingual term alignment based on termhood constraints is presented. Findings Experimental results show multi-level termhood can get better performance than the existing method for terminology extraction. If termhood is used as a constraining factor, the performance of bilingual term alignment can he improved. Originality/value The termhood of the candidate terminology and the sentence that includes the terminology is used for terminology extraction, which is called multi-level termhood. Multi-level termhood is computed by the comparison of the corpus. Bilingual term alignment method based on termhood constraint is put forward and termhood is used in the task of bilingual terminology extraction. Experimental results show that termhood constraints can improve the performance of terminology alignment to some extent.
引用
收藏
页码:295 / 308
页数:14
相关论文
共 28 条
[1]  
[Anonymous], P COLING ACL MAIN C
[2]  
[Anonymous], 1996, The Balancing Act: Combining Symbolic and Statistical Approaches to Language
[3]  
Bourigault D., 1992, P 14 C COMPUTATIONAL, V3, P977, DOI DOI 10.3115/992383.992415
[4]  
Chang J.S., 2005, P 4 SIGHAN WORKSH CH, P64
[5]  
CHURCH KW, 1990, 27TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P76
[6]  
Dunning T., 1993, Computational Linguistics, V19, P61
[7]   Automatic recognition of multi-word terms: The C-value/NC-value method [J].
Frantzi K. ;
Ananiadou S. ;
Mima H. .
International Journal on Digital Libraries, 2000, 3 (2) :115-130
[8]  
Hou Hanqing, 2005, Journal of the China Society for Scientific and Technical Information, V24, P87
[9]  
Huang Fei., 2009, Proceedings of the ACL-IJCNLP '09, P932
[10]  
[黄书剑 HUANG Shujian], 2009, [中文信息学报, Journal of Chinese Information Processing], V23, P88