Hierarchical text classification and evaluation

被引:187
作者
Sun, AX [1 ]
Lim, EP [1 ]
机构
[1] Nanyang Technol Univ, Ctr Adv Informat Syst, Singapore 639798, Singapore
来源
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2001年
关键词
D O I
10.1109/ICDM.2001.989560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical Classification refers to assigning of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on! virtual category trees where documents are assigned only to the leaf categories, we propose a top-down level-based classification method that, can classify documents to both leaf and internal categories. As the standard performance measures assume independence between categories, they have not considered the documents incorrectly classified into categories that are similar or not far from the correct ones in the category tree. We therefore propose the Category-Similarity Measures and Distance-Based Measures to consider the degree of misclassfication in measuring the classification performance. An experiment has been carried out to measure the performance of our proposed hierarchical classification method. The results showed that our method performs well for Reuters text collection when enough training documents are given and the new measures have indeed considered the contributions of misclassified documents.
引用
收藏
页码:521 / 528
页数:8
相关论文
共 17 条
[1]  
[Anonymous], 2001, P 1 SIAM INT C DAT M
[2]  
[Anonymous], 1996, BOW TOOLKIT STAT LAN
[3]  
D'Alessio S., 2000, Content-Based Multimed Inf Access, V1, P302
[4]  
Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
[5]  
Dumais S., 2000, Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '00, New York, NY, USA, P256
[6]  
Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
[7]  
JOACHIMS T, SVMLIGHT IMPLEMENTAT
[8]  
Koller Daphne, 1997, P 14 INT C MACH LEAR
[9]   Yahoo! as an ontology - Using Yahoo! categories to describe documents [J].
Labrou, Y ;
Finin, T .
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, :180-187
[10]  
LEWIS DD, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P37