Mining atomic Chinese abbreviations with a probabilistic single character recovery model

被引:2
作者
Jing-Shin Chang
Wei-Lun Teng
机构
[1] National Chi-Nan University,Department of Computer Science & Information Engineering
来源
Language Resources and Evaluation | 2006年 / 40卷
关键词
Abbreviation; Atomic abbreviation; Single character recovery model;
D O I
暂无
中图分类号
学科分类号
摘要
An HMM-based single character recovery (SCR) model is proposed in this paper to extract a large set of atomic abbreviations and their full forms from a text corpus. By an “atomic abbreviation,” it refers to an abbreviated word consisting of a single Chinese character. This task is important since Chinese abbreviations cannot be enumerated exhaustively but the abbreviation process for compound words seems to be compositional. One can often decode an abbreviated word character by character to its full form. With a large atomic abbreviation dictionary, one may be able to handle multiple character abbreviation problems more easily based on the compositional property of abbreviations.
引用
收藏
页码:367 / 374
页数:7
相关论文
共 3 条
  • [1] Huang C.-R.(1998)A data-driven approach to the mental lexicon: Two studies on Chinese corpus linguistics Bulletin of the Institute of History and Philology 69 151-179
  • [2] Ahrens K.(undefined)undefined undefined undefined undefined-undefined
  • [3] & Chen K.-J.(undefined)undefined undefined undefined undefined-undefined