BoosTexter: A boosting-based system for text categorization

被引:1363
作者
Schapire, RE
Singer, Y
机构
[1] AT&T Labs Res, Shannon Lab, Florham Pk, NJ 07932 USA
[2] Hebrew Univ Jerusalem, Sch Comp Sci & Engn, IL-91904 Jerusalem, Israel
关键词
text and speech categorization; multiclass classification problems; boosting algorithms;
D O I
10.1023/A:1007649029923
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other text-categorization algorithms on a variety of tasks. We conclude by describing the application of our system to automatic call-type identification from unconstrained spoken customer responses.
引用
收藏
页码:135 / 168
页数:34
相关论文
共 40 条
[1]  
[Anonymous], 17 ANN INT ACM SIGIR
[2]  
[Anonymous], 1995, ICML
[3]  
[Anonymous], P INT C AC SPEECH SI
[4]  
APTE C, 1994, P 17 ANN INT ACM SIG, P23
[5]  
BIEBRICHER P, 1988, P 11 INT C RES DEV I, P333
[6]   Empirical support for Winnow and Weighted-Majority algorithms: Results on a calendar scheduling domain [J].
Blum, A .
MACHINE LEARNING, 1997, 26 (01) :5-23
[7]  
Breiman L, 1998, ANN STAT, V26, P801
[8]  
Cohen W. W., 1995, P 12 INT C MACH LEAR, P115, DOI DOI 10.1016/B978-1-55860-377-6.50023-2
[9]  
COHEN WW, 1996, P 19 ANN INT ACM SIG, P307
[10]  
Drucker H, 1996, ADV NEUR IN, V8, P479