Combining rough decisions for intelligent text mining using Dempster's rule

被引:14
作者
Bi, Yaxin [1 ]
McClean, Sally [2 ]
Anderson, Terry [1 ]
机构
[1] Univ Ulster, Sch Comp & Math, Newtownabbey BT37 0QB, Antrim, North Ireland
[2] Univ Ulster, Sch Comp & Informat Engn, Coleraine BT52 1SA, Londonderry, North Ireland
关键词
rule induction; text mining; rough set; Dempster's rule of combination;
D O I
10.1007/s10462-007-9049-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An important issue in text mining is how to make use of multiple pieces knowledge discovered to improve future decisions. In this paper, we propose a new approach to combining multiple sets of rules for text categorization using Dempster's rule of combination. We develop a boosting-like technique for generating multiple sets of rules based on rough set theory and model classification decisions from multiple sets of rules as pieces of evidence which can be combined by Dempster's rule of combination. We apply these methods to 10 of the 20-newsgroups-a benchmark data collection (Baker and McCallum 1998), individually and in combination. Our experimental results show that the performance of the best combination of the multiple sets of rules on the 10 groups of the benchmark data is statistically significant and better than that of the best single set of rules. The comparative analysis between the Dempster-Shafer and the majority voting (MV) methods along with an overfitting study confirm the advantage and the robustness of our approach.
引用
收藏
页码:191 / 209
页数:19
相关论文
共 33 条
[1]  
[Anonymous], ROUGH SET THEORETICA
[2]  
[Anonymous], 1994, Advances in the Dempster-Shafer Theory of Evidence
[3]  
APHINYANAPHONGS Y, 2003, P AMIA S, P31
[4]   AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION [J].
APTE, C ;
DAMERAU, F ;
WEISS, SM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) :233-251
[5]  
Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
[6]  
BI Y, 2004, THESIS U ULSTER
[7]  
Bi YX, 2004, LECT NOTES ARTIF INT, V3215, P521
[8]  
Bi YX, 2004, LECT NOTES COMPUT SC, V3177, P457
[9]   Rough set-aided keyword reduction for text categorization [J].
Chouchoulas, A ;
Shen, Q .
APPLIED ARTIFICIAL INTELLIGENCE, 2001, 15 (09) :843-873
[10]  
Cohen WW, 1999, SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), P335