Punctuation as Implicit Annotations for Chinese Word Segmentation

被引：110

作者：

Li, Zhongguo ^{[1
]}

Sun, Maosong ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

来源：

COMPUTATIONAL LINGUISTICS | 2009年 / 35卷 / 04期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1162/coli.2009.35.4.35403

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a Chinese word segmentation model learned from punctuation marks which are perfect word delimiters. The learning is aided by a manually segmented corpus. Our method is considerably more effective than previous methods in unknown word recognition. This is a step toward addressing one of the toughest problems in Chinese word segmentation.

引用

页码：505 / 512

页数：8

共 15 条

[1]

[Anonymous], 1989, P WORKSHOP SPEECH NA

[2]

[Anonymous], 2006, PROCEEDINGS OF THE 2

[3]

[Anonymous], 2005, P 4 SIGHAN WORKSHOP

[4]

Berger AL, 1996, COMPUT LINGUIST, V22, P39

[5]

Borthwick A, 1999, THESIS NEW YORK U

[6] Accessor variety criteria for Chinese word extraction [J].

Feng, HD ;

Chen, K ;

Deng, XT ;

Zheng, WM .

COMPUTATIONAL LINGUISTICS, 2004, 30 (01) :75-93

[7] Chinese word segmentation and named entity recognition: A pragmatic approach [J].

Gao, JF ;

Li, M ;

Wu, A ;

Huang, CN .

COMPUTATIONAL LINGUISTICS, 2005, 31 (04) :531-574

[8]

JIN ZH, 2006, P 21 INT C COMP LING, P428

[9]

PENG F, 2001, LNCS, V2189, P238

[10]

Peng Fuchun, 2004, P562, DOI [10.3115/1220355.1220436, DOI 10.3115/1220355.1220436]

← 1 2 →