Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains

被引：23

作者：

Bernstein, Andrey ^{[1
]}

Shimkin, Nahum ^{[1
]}

机构：

[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel

来源：

MACHINE LEARNING | 2010年 / 81卷 / 03期

关键词：

Reinforcement learning; Adaptive resolution; Kernel functions; Efficient exploration; TIME; ALGORITHM;

D O I：

10.1007/s10994-010-5186-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a model-based learning algorithm, the Adaptive-resolution Reinforcement Learning (ARL) algorithm, that aims to solve the online, continuous state space reinforcement learning problem in a deterministic domain. Our goal is to combine adaptive-resolution approximation schemes with efficient exploration in order to obtain polynomial learning rates. The proposed algorithm uses an adaptive approximation of the optimal value function using kernel-based averaging, going from coarse to fine kernel-based representation of the state space, which enables us to use finer resolution in the "important" areas of the state space, and coarser resolution elsewhere. We consider an online learning approach, in which we discover these important areas online, using an uncertainty intervals exploration technique. In addition, we introduce an incremental variant of the ARL (IARL), which is a more practical version of the original algorithm with reduced computational complexity at each stage. Polynomial learning rates in terms of mistake bound (in a PAC framework) are established for these algorithms, under appropriate continuity assumptions.

引用

页码：359 / 397

页数：39

共 32 条

[1] Albus J.S., 1975, J DYNAMIC SYSTEMS, V97, P220, DOI DOI 10.1115/1.3426922
[2] [Anonymous], 2007, DYNAMIC PROGRAMMING
[3] [Anonymous], 2006, PROCEEDINGS OF THE 2
[4] Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
Antos, Andras
Szepesvari, Csaba
Munos, Remi
[J]. MACHINE LEARNING, 2008, 71 (01) : 89 - 129
[5] AUER P, 2006, P NEUR INF PROC SYST
[6] BERNSTEIN A, 2008, P 21 ANN C LEARN THE
[7] Bernstein A., 2007, THESIS ISRAEL I TECH
[8] BONARINI A, 2005, P NEUR INF PROC SYST, P41
[9] Boutilier C, 1999, J ARTIF INTELL RES, V11, P1
[10] R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
Brafman, RI
Tennenholtz, M
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (02) : 213 - 231

← 1 2 3 4 →