Question classification based on co-training style semi-supervised learning

被引:24
作者
Yu, Zhengtao [1 ,2 ]
Su, Lei [3 ]
Li, Lina [1 ]
Zhao, Quan [1 ]
Mao, Cunli [1 ]
Guo, Jianyi [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Sch Informat Engn & Automat, Kunming 650051, Peoples R China
[2] Inst Intelligent Informat Proc, Key Lab Yunnan Prov, Kunming 650051, Peoples R China
[3] Yunnan Univ, Dept Software, Kunming 650091, Peoples R China
关键词
Chinese question classification; Word semantic similarity; Semi-supervised learning; Co-training;
D O I
10.1016/j.patrec.2010.06.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In statistical question classification, semi-supervised learning that can exploit the abundant unlabeled samples has received substantial attention in recent years. In this paper, a novel question classification approach with the co-training style semi-supervised learning is proposed. In particular, the method extracts high-frequency keywords as classification features, and uses the word semantic similarity to adjust the feature weights. The classifiers are initially trained from labeled data and then the learned models are refined using unlabeled data which can get labeled if the classifiers agree on the labeling. Experiments on the Chinese question answering system in tourism domain were conducted by employing different feature selections, different supervised and semi-supervised algorithms, different feature dimensions and different unlabeled rates. The experimental results show the proposed method can effectively improve the classification accuracy. Specifically, under the 40% unlabeled rate of training set, the average accuracy rates reach 88.9% on coarse types and 78.2% on fine types, respectively, which get an improvement of around 2-4% points. (c) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:1975 / 1980
页数:6
相关论文
共 15 条
[1]  
[Anonymous], 2002, Learning Question Classifiers
[2]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[3]  
[邓超 DENG Chao], 2007, [计算机学报, Chinese Journal of Computers], V30, P1213
[4]  
Hacioglu K., 2003, P HLT NACCL 2003 EDM, P28
[5]  
Li M, 2005, LECT NOTES ARTIF INT, V3518, P611
[6]   Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples [J].
Li, Ming ;
Zhou, Zhi-Hua .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (06) :1088-1098
[7]  
LI X, 2004, P 1 INT JOINT C NAT, P451
[8]  
LI YW, 2009, INT J ADV SCI TECHNO, V3, P45
[9]  
Nigam K., 2000, Proceedings of the Ninth International Conference on Information and Knowledge Management. CIKM 2000, P86, DOI 10.1145/354756.354805
[10]  
Qun Liu, 2002, 3 CHIN WORD SEM C CH