Introducing a family of linear measures for feature selection in text categorization

被引:40
作者
Combarro, EF [1 ]
Montañés, E [1 ]
Díaz, I [1 ]
Ranilla, J [1 ]
Mones, R [1 ]
机构
[1] Univ Oviedo, Ctr Artificial Intelligence, Gijon 33204, Spain
关键词
text categorization; feature selection; filtering measures; machine learning;
D O I
10.1109/TKDE.2005.149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 [模式识别与智能系统]; 0812 [计算机科学与技术]; 0835 [软件工程]; 1405 [智能科学与技术];
摘要
Text Categorization, which consists of automatically assigning documents to a set of categories, usually involves the management of a huge number of features. Most of them are irrelevant and others introduce noise which could mislead the classifiers. Thus, feature reduction is often performed in order to increase the efficiency and effectiveness of the classification. In this paper, we propose to select relevant features by means of a family of linear filtering measures which are simpler than the usual measures applied for this purpose. We carry out experiments over two different corpora and find that the proposed measures perform better than the existing ones.
引用
收藏
页码:1223 / 1232
页数:10
相关论文
共 18 条
[1]
AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION [J].
APTE, C ;
DAMERAU, F ;
WEISS, SM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) :233-251
[2]
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[3]
Improving performance of text categorization by combining filtering and support vector machines [J].
Díaz, I ;
Ranilla, J ;
Montañes, E ;
Fernández, J ;
Combarro, EF .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (07) :579-592
[4]
Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
[5]
Furnkranz J., 1994, Proceedings of the Eleventh International Conference on Machine Learning, P70
[6]
Galavotti L, 2000, LECT NOTES COMPUT SC, V1923, P59
[7]
Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
[8]
EFFECTS OF COMPUTER AND NONCOMPUTER ENVIRONMENTS ON STUDENTS CONCEPTUALIZATIONS OF GEOMETRIC MOTIONS [J].
JOHNSONGENTILE, K ;
CLEMENTS, DH ;
BATTISTA, MT .
JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 1994, 11 (02) :121-140
[9]
Mladenic D, 1999, MACHINE LEARNING, PROCEEDINGS, P258
[10]
Montañés E, 2003, LECT NOTES COMPUT SC, V2810, P589, DOI 10.1007/978-3-540-45231-7_54