A Joint Model to Identify and Align Bilingual Named Entities

被引:17
作者
Chen, Yufeng [1 ]
Zong, Chengqing [1 ]
Su, Keh-Yih [2 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
[2] Behav Design Corp, Hsinchu, Taiwan
关键词
D O I
10.1162/COLI_a_00122
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
In this article, an integrated model is derived that jointly identifies and aligns bilingual named entities (NEs) between Chinese and English. The model is motivated by the following observations: (1) whether an NE is translated semantically or phonetically depends greatly on its entity type, (2) entities within an aligned pair should share the same type, and (3) the initially detected NEs can act as anchors and provide further information while selecting NE candidates. Based on these observations, this article proposes a translation mode ratio feature (defined as the proportion of NE internal tokens that are semantically translated), enforces an entity type consistency constraint, and utilizes additional new NE likelihoods (based on the initially detected NE anchors). Experiments show that this novel method significantly outperforms the baseline. The type-insensitive F-score of identified NE pairs increases from 78.4% to 88.0% (12.2% relative improvement) in our Chinese-English NE alignment task, and the type-sensitive F-score increases from 68.4% to 83.0% (21.3% relative improvement). Furthermore, the proposed model demonstrates its robustness when it is tested across different domains. Finally, when semi-supervised learning is conducted to train the adopted English NE recognition model, the proposed model also significantly boosts the English NE recognition type-sensitive F-score.
引用
收藏
页码:229 / 266
页数:38
相关论文
共 67 条
[1]
Al-Onaizan Y, 2002, 40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P400
[2]
[Anonymous], 1997, P 2 INT WORKSH INF R
[3]
[Anonymous], 2002, COLING 2002 19 INT C
[4]
[Anonymous], 1997, P 5 APPL NAT LANG PR, DOI DOI 10.3115/974557.974586
[5]
[Anonymous], NATURAL LANGUAGE ENG
[6]
[Anonymous], P 7 MESS UND C MUC 7
[7]
[Anonymous], COMPUTATIONAL LINGUI
[8]
[Anonymous], P 6 C NAT LANG LEARN
[9]
[Anonymous], P MESS UND C MUC 7 F
[10]
[Anonymous], 2002, P CONLL 2002 6 C NAT