基于字符的中文分词、词性标注和依存句法分析联合模型

被引:14
作者
郭振
张玉洁
苏晨
徐金安
机构
[1] 北京交通大学计算机与信息技术学院
关键词
联合模型; 中文分词和词性标注; 依存句法分析; 词语内部依存结构; 半监督学习;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
摘要
目前,基于转移的中文分词、词性标注和依存句法分析联合模型存在两大问题:一是任务的融合方式有待改进;二是模型性能受限于全标注语料的规模。针对第一个问题,该文利用词语内部结构将基于词语的依存句法树扩展成了基于字符的依存句法树,采用转移策略,实现了基于字符的中文分词、词性标注和依存句法分析联合模型;依据序列标注的中文分词方法,将基于转移的中文分词处理方案重新设计为4种转移动作:ShiftS、ShiftB、ShiftM和ShiftE,同时能够将以往中文分词的研究成果融入联合模型。针对第二个问题,该文使用具有部分标注信息的语料,从中抽取字符串层面的n-gram特征和结构层面的依存子树特征融入联合模型,实现了半监督的中文分词、词性标注和依存句法分析联合模型。在宾州中文树库上的实验结果表明,该文的模型在中文分词、词性标注和依存分析任务上的F1值分别达到了98.31%、94.84%和81.71%,较单任务模型的结果分别提升了0.92%、1.77%和3.95%。其中,中文分词和词性标注在目前公布的研究结果中取得了最好成绩。
引用
收藏
页码:1 / 8+17 +17
页数:9
相关论文
共 23 条
  • [1] Incremental parsing with the perceptron algorithm. Collins M,Roark B. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics . 2004
  • [2] http://w3.msi.vxu.se/ nivre/research/Penn2Malt.html .
  • [3] A fast decoder for joint word segmentation and POS-taggingusing a single discriminative model. Zhang Y,,Clark S. Proceedings of the2010Conferenceon Empirical Methods in Natural Language Processing . 2010
  • [4] An Error-Driven Word-CharacterHybrid Model for Joint Chinese Word Segmentation and POS Tagging. Kruengkrai C,Uchimoto K,Kazama J, et al. Pro-ceedings of the Joint Conference of the47th Annual Meeting of the ACL and the4th International Joint Conference on Natural Language Processing of the AFNLP . 2009
  • [5] Character-Level Chinese Dependency Parsing. Zhang M,Zhang Y,Che W,et al. Proceedings of the52nd Annual Meeting of the Association for Computational Linguistics . 2014
  • [6] Improving Chinese Word Segmentation and POS Tagging with Semi-supervised Methods Using Large AutoAnalyzed Data. Wang Y,Jun’’ichi Kazama Y T,Tsuruoka Y,et al. Proceedings of the IJCNLP . 2011
  • [7] Fast and Accurate Shift-Reduce Constituent Parsing. Zhu M,Zhang Y,Chen W,et al. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics . 2013
  • [8] Effective tag set selection in Chinese word segmentation via conditional random field modeling. Zhao H,Huang C N,Li M et al. PACLIC-20 . 2006
  • [9] Analyzing the Effect of Global Learning and Beam-Search on Transition-Based Dependency Parsing. Zhang Y,Nivre J. Proceedings of the COLING (Posters) . 2012
  • [10] Improving Dependency Parsing with Sub-trees from Auto-Parsed Data. Chen W,Kazama J,Uchimoto K, et al. Proceedings of the2009Conference on EmpiricalMethods in Natural Language Processing . 2009