Punctuation as Implicit Annotations for Chinese Word Segmentation

被引:110
作者
Li, Zhongguo [1 ]
Sun, Maosong [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
基金
美国国家科学基金会;
关键词
D O I
10.1162/coli.2009.35.4.35403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a Chinese word segmentation model learned from punctuation marks which are perfect word delimiters. The learning is aided by a manually segmented corpus. Our method is considerably more effective than previous methods in unknown word recognition. This is a step toward addressing one of the toughest problems in Chinese word segmentation.
引用
收藏
页码:505 / 512
页数:8
相关论文
共 15 条
[1]  
[Anonymous], 1989, P WORKSHOP SPEECH NA
[2]  
[Anonymous], 2006, PROCEEDINGS OF THE 2
[3]  
[Anonymous], 2005, P 4 SIGHAN WORKSHOP
[4]  
Berger AL, 1996, COMPUT LINGUIST, V22, P39
[5]  
Borthwick A, 1999, THESIS NEW YORK U
[6]   Accessor variety criteria for Chinese word extraction [J].
Feng, HD ;
Chen, K ;
Deng, XT ;
Zheng, WM .
COMPUTATIONAL LINGUISTICS, 2004, 30 (01) :75-93
[7]   Chinese word segmentation and named entity recognition: A pragmatic approach [J].
Gao, JF ;
Li, M ;
Wu, A ;
Huang, CN .
COMPUTATIONAL LINGUISTICS, 2005, 31 (04) :531-574
[8]  
JIN ZH, 2006, P 21 INT C COMP LING, P428
[9]  
PENG F, 2001, LNCS, V2189, P238
[10]  
Peng Fuchun, 2004, P562, DOI [10.3115/1220355.1220436, DOI 10.3115/1220355.1220436]