Support vector machine active learning with applications to text classification

被引:1537
作者
Tong, S [1 ]
Koller, D [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
关键词
active learning; selective sampling; support vector machines; classification; relevance feedback;
D O I
10.1162/153244302760185243
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Support vector machines have met with significant success in numerous real-world learning tasks. However, like most machine learning algorithms, they are generally applied using a randomly selected training set classified in advance. In many settings, we also have the option of using pool-based active learning. Instead of using a randomly selected training set, the learner has access to a pool of unlabeled instances and can request the labels for some number of them. We introduce a new algorithm for performing active learning with support vector machines, i.e., an algorithm for choosing which instances to request next. We provide a theoretical motivation for the algorithm using the notion of a version space. We present experimental results showing that employing our active learning method can significantly reduce the need for labeled training instances in both the standard inductive and transductive settings.
引用
收藏
页码:45 / 66
页数:22
相关论文
共 28 条
[1]  
[Anonymous], P 17 INT C MACH LEAR
[2]  
[Anonymous], 1999, P 12 ANN C COMP LEAR
[3]  
[Anonymous], 1982, ESTIMATION DEPENDENC
[4]  
[Anonymous], 1996, BOW TOOLKIT STAT LAN
[5]  
[Anonymous], 1994, SIGIR
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]  
CAUWENBERGHS G, 2001, ADV NEURAL INFORMATI, V13
[8]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[9]  
Dagan I., 1995, P 12 INT C MACH LEAR, P150, DOI [10.1016/B978-1-55860-377-6.50027-X, DOI 10.1016/B978-1-55860-377-6.50027-X]
[10]  
DUMAIS S, 1998, P 7 INT C INF KNOWL