Sequential prediction of individual sequences under general loss functions

被引:90
作者
Haussler, D [1 ]
Kivinen, J
Warmuth, MK
机构
[1] Univ Calif Santa Cruz, Santa Cruz, CA 95064 USA
[2] Univ Helsinki, Dept Comp Sci, FIN-00014 Helsinki, Finland
基金
美国国家科学基金会;
关键词
on-line learning; universal prediction; worst case loss bounds; worst case regret;
D O I
10.1109/18.705569
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction strategies, called experts. By using a general loss function, me generalize previous work on universal prediction, forecasting, and data compression. However, here we restrict ourselves to the case when the comparison class is finite. For a given sequence, we define the regret as the total loss on the entire sequence suffered by the adaptive sequential predictor, minus the total loss suffered by the predictor in the comparison class that performs best on that particular sequence. We show that for a large class of loss functions, the minimax regret is either Theta(log N) or Omega(root l log N), depending on the loss function, where N is the number of predictors in the comparison class and l is the length of the sequence to be predicted. The former case was shown previously by Vovk; we give a simplified analysis with an explicit closed form for the constant in the minimax regret formula, and give a probabilistic argument that shows this constant is the best possible. Some weak regularity conditions are imposed on the loss function in obtaining these results. We also extend our analysis to the case of predicting arbitrary sequences that take real values in the interval [0.1].
引用
收藏
页码:1906 / 1925
页数:20
相关论文
共 38 条
[1]  
Auer P, 1995, AN S FDN CO, P322, DOI 10.1109/SFCS.1995.492488
[2]  
BARRON A., 1992, P 3 NEC S COMP COGN, P74
[3]  
Barron A. R., 1996, P 9 ANN C COMP LEARN
[4]  
Billingsley Patrick, 1995, Probability and Measure, V3
[5]  
Blackwell D., 1956, PAC J MATH, V6, P1, DOI [DOI 10.2140/PJM.1956.6.1, 10.2140/pjm.1956.6.1]
[6]  
Blackwell D., 1954, P INT C MATH, V3, P336
[7]  
Cesa-Bianchi N., 1996, Proceedings of the Ninth Annual Conference on Computational Learning Theory, P314, DOI 10.1145/238061.238162
[8]   On-line prediction and conversion strategies [J].
CesaBianchi, N ;
Freund, Y ;
Helmbold, DP ;
Warmuth, MK .
MACHINE LEARNING, 1996, 25 (01) :71-110
[9]   How to use expert advice [J].
CesaBianchi, N ;
Freund, Y ;
Haussler, D ;
Helmbold, DP ;
Schapire, RE ;
Warmuth, MK .
JOURNAL OF THE ACM, 1997, 44 (03) :427-485
[10]   Worst-case quadratic loss bounds for prediction using linear functions and gradient descent [J].
CesaBianchi, N ;
Long, PM ;
Warmuth, MK .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (03) :604-619