The true sample complexity of active learning

被引:65
作者
Balcan, Maria-Florina [1 ]
Hanneke, Steve [2 ]
Vaughan, Jennifer Wortman [3 ]
机构
[1] Georgia Inst Technol, Sch Comp Sci, Coll Comp, Atlanta, GA 30332 USA
[2] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15213 USA
[3] Harvard Univ, Sch Engn & Appl Sci, Cambridge, MA 02138 USA
基金
美国国家科学基金会;
关键词
Active learning; Sample complexity; Selective sampling; Sequential design; Learning theory; Classification;
D O I
10.1007/s10994-010-5174-y
中图分类号
TP18 [人工智能理论];
学科分类号
140502 [人工智能];
摘要
We describe and explore a new perspective on the sample complexity of active learning. In many situations where it was generally believed that active learning does not help, we show that active learning does help in the limit, often with exponential improvements in sample complexity. This contrasts with the traditional analysis of active learning problems such as non-homogeneous linear separators or depth-limited decision trees, in which Omega(1/epsilon) lower bounds are common. Such lower bounds should be interpreted carefully; indeed, we prove that it is always possible to learn an epsilon-good classifier with a number of samples asymptotically smaller than this. These new insights arise from a subtle variation on the traditional definition of sample complexity, not previously recognized in the active learning literature.
引用
收藏
页码:111 / 139
页数:29
相关论文
共 21 条
[1]
Strong minimax lower bounds for learning [J].
Antos, A ;
Lugosi, G .
MACHINE LEARNING, 1998, 30 (01) :31-56
[2]
Balcan M., 2006, P 23 INT C MACH LEAR, P65
[3]
Balcan M.-F., 2006, SEMISUPERVISED LEARN
[4]
BALCAN MF, 2007, P 20 ANN C LEARN THE
[5]
BALCAN MF, 2008, P 21 ANN C LEARN THE
[6]
LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION [J].
BLUMER, A ;
EHRENFEUCHT, A ;
HAUSSLER, D ;
WARMUTH, MK .
JOURNAL OF THE ACM, 1989, 36 (04) :929-965
[7]
CASTRO R, 2007, P 20 ANN C LEARN THE
[8]
IMPROVING GENERALIZATION WITH ACTIVE LEARNING [J].
COHN, D ;
ATLAS, L ;
LADNER, R .
MACHINE LEARNING, 1994, 15 (02) :201-221
[9]
DASGUPTA S, 2005, P 18 ANN C LEARN THE
[10]
Dasgupta S., 2007, ADV NEURAL INFORM PR