Combining trigram and automatic weight distribution in chinese spelling error correction

被引:12
作者
Li, JH [1 ]
Wang, XL [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
spelling error correction; language model; edit distance; weight distribution;
D O I
10.1007/BF02960784
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The researches on spelling correction aiming at detecting errors in texts tend to focus on context-sensitive spelling error correction, which is more difficult than traditional isolated-word error correction. A novel and efficient algorithm for the system of Chinese spelling error correction, CInsunSpell, is presented. In this system, the work of correction includes two parts: checking phase and correcting phase. At the first phase, a Trigram algorithm within one fixed-size window is designed to locate potential errors in local area. The second phase employs. a new method of automatically and dynamically distributing weights among the characters in the confusion set as well as in the Bayesian language model. The tactics used above exhibits good performances.
引用
收藏
页码:915 / 923
页数:9
相关论文
共 16 条
[1]  
DAN R, 1998, P COLING 98 MONTR CA, P1136
[2]  
DAVID S, 1998, P COLING 98, P1198
[3]  
Golding A. R., 1996, Machine Learning. Proceedings of the Thirteenth International Conference (ICML '96), P182
[4]  
Golding A.R., 1995, P 3 WORKSH VER LARG, P39
[5]  
Golding A R, 1996, PROC 34 ANN M ASS CO, P71
[6]   A Winnow-based approach to context-sensitive spelling correction [J].
Golding, AR ;
Roth, D .
MACHINE LEARNING, 1999, 34 (1-3) :107-130
[7]  
KUKICH K, 1992, COMPUT SURV, V24, P377
[8]   SPELLING CORRECTION FOR THE TELECOMMUNICATIONS NETWORK FOR THE DEAF [J].
KUKICH, K .
COMMUNICATIONS OF THE ACM, 1992, 35 (05) :80-90
[9]  
Li Jianhua, 2000, High Technology Letters (English Language Edition), V6, P1
[10]  
Littlestone N., 1988, Machine Learning, V2, P285, DOI 10.1007/BF00116827