The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant

被引:51
作者
Kivinen, J
Warmuth, MK
Auer, P
机构
[1] Graz Univ Technol, Inst Theoret Comp Sci, A-8010 Graz, Austria
[2] Univ Helsinki, Dept Comp Sci, FIN-00014 Helsinki, Finland
[3] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
关键词
linear threshold functions; perceptron algorithm; relevant variables; multiplicative updates; mistake bounds;
D O I
10.1016/S0004-3702(97)00039-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We give an adversary strategy that forces the Perceptron algorithm to make Omega(kN) mistakes in learning monotone disjunctions over N variables with at most k literals. In contrast, Littlestone's algorithm Winnow makes at most O(klog N) mistakes for the same problem. Both algorithms use thresholded linear functions as their hypotheses. However, Winnow does multiplicative updates to its weight vector instead of the additive updates of the Perceptron algorithm, In general, we call an algorithm additive if its weight vector is always a sum of a fixed initial weight vector and some linear combination of already seen instances. Thus, the Perceptron algorithm is an example of an additive algorithm, We show that an adversary can force any additive algorithm to make (N + k - 1)/2 mistakes in learning a monotone disjunction of at most k literals. Simple experiments show that for k much less than N, Winnow clearly outperforms the Perceptron algorithm also on nonadversarial random data. (C) 1997 Elsevier Science B.V.
引用
收藏
页码:325 / 343
页数:19
相关论文
共 23 条
[1]  
[Anonymous], 1979, Soviet Math. Dokl
[2]  
[Anonymous], COMPUTATIONAL LEARNI
[3]  
Auer P, 1995, AN S FDN CO, P312, DOI 10.1109/SFCS.1995.492487
[4]   LEARNABILITY AND THE VAPNIK-CHERVONENKIS DIMENSION [J].
BLUMER, A ;
EHRENFEUCHT, A ;
HAUSSLER, D ;
WARMUTH, MK .
JOURNAL OF THE ACM, 1989, 36 (04) :929-965
[5]  
Cesa-Bianchi N., 1993, Proceedings of the Twenty-Fifth Annual ACM Symposium on the Theory of Computing, P382, DOI 10.1145/167088.167198
[6]  
CESABIANCHI N, 1994, UCSCCRL9433
[7]  
Duda R. O., 1973, PATTERN CLASSIFICATI, V3
[8]  
Helmbold D. P., 1995, Proceedings of the Eighth Annual Conference on Computational Learning Theory, P61, DOI 10.1145/225298.225305
[9]  
KIVINEN J, 1979, INFORM COMPUT, V132, P1
[10]  
LITTLESTONE N, 1991, PROCEEDINGS OF THE FOURTH ANNUAL WORKSHOP ON COMPUTATIONAL LEARNING THEORY, P147