Predicting Chinese Abbreviations from Definitions:An Empirical Learning Approach Using Support Vector Regression

被引:8
作者
孙栩 [1 ]
王厚峰 [1 ]
王波 [1 ]
机构
[1] Institute of Computational Linguistics,School of Electronics Engineering and Computer Science
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
statistical natural language processing; abbreviation prediction; support vector regression; word clustering;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
In Chinese,phrases and named entities play a central role in information retrieval.Abbreviations,however, make keyword-based approaches less effective.This paper presents an empirical learning approach to Chinese abbreviation prediction.In this study,each abbreviation is taken as a reduced form of the corresponding definition (expanded form), and the abbreviation prediction is formalized as a scoring and ranking problem among abbreviation candidates,which are automatically generated from the corresponding definition.By employing Support Vector Regression (SVR) for scoring, we can obtain multiple abbreviation candidates together with their SVR values,which are used for candidate ranking. Experimental results show that the SVR method performs better than the popular heuristic rule of abbreviation prediction. In addition,in abbreviation prediction,the SVR method outperforms the hidden Markov model (HMM).
引用
收藏
页码:602 / 611
页数:10
相关论文
共 4 条
  • [1] Mining atomic Chinese abbreviations with a probabilistic single character recovery model
    Jing-Shin Chang
    Wei-Lun Teng
    [J]. Language Resources and Evaluation, 2006, 40 : 367 - 374
  • [2] A tutorial on support vector regression
    Smola, AJ
    Schölkopf, B
    [J]. STATISTICS AND COMPUTING, 2004, 14 (03) : 199 - 222
  • [3] PNAD-CSS: a workbench for constructing a protein name abbreviation dictionary[J] . Mikio Yoshida,Ken-ichiro Fukuda.Bioinformatics . 2000
  • [4] Estimation of probabilities from sparse data for the languagemodel component of a speech recogniser .2 S. M. Katz. IEEE Trans ASSP . 1987