Automatic textual document categorization based on generalized instance sets and a metamodel

被引:38
作者
Lam, W [1 ]
Han, YQ [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
关键词
text classification; instance-based learning; metamodel learning;
D O I
10.1109/TPAMI.2003.1195997
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new approach to text categorization known as generalized instance set (GIS) algorithm under the framework of generalized instance patterns. Our GIS algorithm unifies the strengths of k-NN and linear classifiers and adapts to characteristics of text categorization problems. It focuses on refining the original instances and constructs a set of generalized instances. We also propose a metamodel framework based on category feature characteristics. It has a metalearning phase which discovers a relationship between category feature characteristics and each component algorithm. Extensive experiments have been conducted on two large-scale document corpora for both GIS and the metamodel. The results demonstrate that both approaches generally achieve promising text categorization performance.
引用
收藏
页码:628 / 633
页数:6
相关论文
共 16 条
[1]   Support vector machines for spam categorization [J].
Drucker, H ;
Wu, DH ;
Vapnik, VN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1999, 10 (05) :1048-1054
[2]  
Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
[3]  
Joachims T, 2001, P 24 ANN INT ACM SIG, P128, DOI [DOI 10.1145/383952.383974, 10.1145/383952.383974]
[4]   HANDWRITTEN DIGIT RECOGNITION BY NEURAL NETWORKS WITH SINGLE-LAYER TRAINING [J].
KNERR, S ;
PERSONNAZ, L ;
DREYFUS, G .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (06) :962-968
[5]   OPTIMAL COMBINATIONS OF PATTERN CLASSIFIERS [J].
LAM, L ;
SUEN, CY .
PATTERN RECOGNITION LETTERS, 1995, 16 (09) :945-954
[6]  
LAM W, 1998, P 21 ANN INT ACM SIG, P81, DOI DOI 10.1145/290941.290961
[7]  
LEWIS DD, 1996, P 19 ANN INT ACM SIG, P298
[8]  
McCallum A., 1998, COMP EVENT MODELS NA
[9]  
Rocchio J. J., 1971, SMART RETRIEVAL SYST
[10]   BoosTexter: A boosting-based system for text categorization [J].
Schapire, RE ;
Singer, Y .
MACHINE LEARNING, 2000, 39 (2-3) :135-168