Maximizing text-mining performance

被引:93
作者
Weiss, SM [1 ]
Apte, C [1 ]
Damerau, FJ [1 ]
Johnson, DE [1 ]
Oles, FJ [1 ]
Goetz, T [1 ]
Hampp, T [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Nat Language Understanding Grp, Yorktown Heights, NY 10598 USA
来源
IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS | 1999年 / 14卷 / 04期
关键词
D O I
10.1109/5254.784086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization is the problem of automatically assigning predefined categories to documents. A new text-mining approach is presented that uses an adaptive-resampling strategy to train decision-tree classifiers. The approach is demonstrated using the Reuters-21578 benchmark data and a real-world customer e-mail routing system.
引用
收藏
页码:63 / 69
页数:7
相关论文
共 15 条
[1]  
[Anonymous], 1997, TEXT CATEGORIZATION
[2]   Data mining with decision trees and decision rules [J].
Apte, C ;
Weiss, S .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 1997, 13 (2-3) :197-210
[3]   AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION [J].
APTE, C ;
DAMERAU, F ;
WEISS, SM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) :233-251
[4]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[5]  
DAGAN I, 1997, P 2 C EMP METH NAT L
[6]  
DUMAIS S, 1998, INDUCTIVE LEARNING A
[7]  
Freund Y., 1996, Experiments with a new boosting algorithm. In proceedings 13th Int Conf Mach learn. Pp.148-156, P148
[8]  
Friedman J., 1998, ADDITIVE LOGISTIC RE
[9]  
*IBM, 1997, SH12621301 IBM SOFTW
[10]  
LEWIS DD, 1992, P SPEECH NAT LANG WO, P212