Robust and efficient multiclass SVM models for phrase pattern recognition

被引:68
作者
Wu, Yu-Chieh [2 ]
Lee, Yue-Shi [3 ]
Yang, Jie-Chi [1 ]
机构
[1] Natl Cent Univ, Grad Inst Network Learning Technol, Jhongli 32001, Taoyuan, Taiwan
[2] Natl Cent Univ, Dept Comp Sci, Jhongli 32001, Taoyuan, Taiwan
[3] Ming Chuan Univ, Dept Comp Sci, Tao Yuan 333, Taiwan
关键词
machine learning; multiclass classification; natural language processing; support vector machines;
D O I
10.1016/j.patcog.2008.02.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Phrase pattern recognition (phrase chunking) refers to automatic approaches for identifying predefined phrase structures in a stream of text. Support vector machines (SVMs)-based methods had shown excellent performance in many sequential text pattern recognition tasks such as protein name finding, and noun phrase (NP)-chunking. Even though they yield very accurate results, they are not efficient for online applications, which need to handle hundreds of thousand words in a limited time. In this paper, we firstly re-examine five typical multiclass SVM methods and the adaptation to phrase chunking. However, most of them were inefficient when the number of phrase types scales. We thus introduce the proposed two new multiclass SVM models that make the system substantially faster in terms of training and testing while keeps the SVM accurate. The two methods can also be applied to similar tasks such as named entity recognition and Chinese word segmentation. Experiments on CoNLL-2000 chunking and Chinese base-chunking tasks showed that our method can achieve very competitive accuracy and at least 100 times faster than the state-of-the-art SVM-based phrase chunking method. Besides, the computational time complexity and the time cost analysis of our methods were also given in this paper. (c) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2874 / 2889
页数:16
相关论文
共 46 条
  • [1] Abney S. P., 1991, Principle-based parsing, P257, DOI DOI 10.1007/978-94-011-3474-3_10
  • [2] [Anonymous], 2005, Computational Linguistics and Chinese Language Processing
  • [3] [Anonymous], 1995, P 3 ACL WORKSH VER L
  • [4] [Anonymous], 2003, P RANLP
  • [5] [Anonymous], 2003, J MACHINE LEARNING R
  • [6] Brill E, 1995, COMPUT LINGUIST, V21, P543
  • [7] Buchholz Sabine., 2006, 10 C COMPUTATIONAL N, P149, DOI [10.3115/1596276.1596305, 10.33218/001c.13521, DOI 10.33218/001C.13521]
  • [8] Carreras X., 2005, MACH LEARN, V59, P1
  • [9] Cormen T.H., 2002, INTRO ALGORITHMS, V2nd
  • [10] CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411