Utilizing high-quality feature extension mode to classify chinese short-text

被引:5
作者
Fan X. [1 ]
Hu H. [1 ]
机构
[1] College of Computer Science and Technology, Chongqing University of Posts and Telecommunications
关键词
Chinese short-text classification; Cooccurrence relationship; Feature extension; High-quality feature extension mode;
D O I
10.4304/jnw.5.12.1417-1425
中图分类号
学科分类号
摘要
This paper presents a method of classifying Chinese short-texts that have weak concept signal, in which high-quality feature extension modes are extracted and used effectively. In the method, a feature extension mode is considered as a set of terms that have co-occurrence relationship in the training data, and three measures that decide whether it is high-quality, i.e., Confidence, category homoplasy and relevancy strength, are presented. Then, an algorithm, which extracts high-quality feature extension modes from training data, is designed. Next, Chinese shorttext classification algorithm utilizing feature extension modes is presented, in which a short-text is extended by adding new features or modifying the weights of initial features, according to the relationship between non-feature term and feature extension mode. The experiments show that (1) A high-quality feature extension mode is helpful to improve Chinese short-text classification; (2) the proposed method can obtain a higher classification performance comparing with the conventional text classification methods. © 2010 Academy Publisher.
引用
收藏
页码:1417 / 1425
页数:8
相关论文
共 11 条
[1]  
Sebastiani F., Machine Learning in Automated Text Categorization, A, ACM Computing Surveys, C, 34, 1, pp. 1-47, (2002)
[2]  
Xing-Hua F., Peng W., Chinese Short-Text Classification in Two-Step, J, Journal of DaLian Maritime University, 11, 2, pp. 201-206, (2008)
[3]  
Zelikovitz S., Hirsh H., Improving Short Text Classification Using Unlabeled Background Knowledge to Assess Document Similarity, C, Proceedings of ICML-2002, pp. 1183-1190, (2002)
[4]  
Shen W., Xing-Hua F., Xian-Lin C., Chinese short-text classification based on hyponymy relation, J, Journal of Computer Application, 30, 3, pp. 603-606, (2010)
[5]  
Xi-Wei W., Xing-Hua F., Jun Z., A Method for Chinese Short Text Classification Based on Feature Extension, J, Journal of Computer Applications, 29, 3, pp. 843-845, (2009)
[6]  
Zelikovitz S., Marquez F., Transductive Learning for Short-Text Classification Problems using Latent Semantic Indexing, International Journal of Pattern Recognition and Artificial Intelligence, 19, 2, pp. 143-163, (2005)
[7]  
Zelikovitz S., Transductive LSI for short text classification problems, Proceedings of the 17th International FLAIRS Conference, pp. 556-561
[8]  
Jiawei H.A.N., Pei J., Yiwen Y.I.N., Mao B., Ming Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree, Data Mining and Knowledge Discovery, 8, pp. 53-87, (2004)
[9]  
Rushing J.A., Using Association Rules as Texture Features, J.IEEE Trans. On Pattern Analysis and Machine Intelligence, 23, 8, pp. 845-858, (2001)
[10]  
Fei L., Xuan-Qing H., Li-de W., Approach for Extracting Thematic Terms Based on Association Rule, J.Computer Engineering, 4, pp. 81-83, (2008)