The ODE method for convergence of stochastic approximation and reinforcement learning

被引：329

作者：

Borkar, VS ^{[1
]}

Meyn, SP

机构：

[1] Tata Inst Fundamental Res, Sch Technol & Comp Sci, Mumbai 400005, India

[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

[3] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA

[4] Indian Inst Sci, Bangalore 560012, Karnataka, India

来源：

SIAM JOURNAL ON CONTROL AND OPTIMIZATION | 2000年 / 38卷 / 02期

关键词：

stochastic approximation; ODE method; stability; asynchronous algorithms; reinforcement learning;

D O I：

10.1137/S0363012997331639

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a priori assumption of stability; (iii) a proof for the first time that asynchronous adaptive critic and Q-learning algorithms are convergent for the average cost optimal control problem.

引用

页码：447 / 469

页数：23

共 21 条

[1]

ABOUNADI J, UNPUB SIAM J CONTROL

[2]

[Anonymous], 1992, Stochastic Stability of Markov chains

[3]

Barto A G., 1983, IEEE Trans, on Systems, Man, and Cybernetics, V13, P835

[4]

Benveniste A, 1990, Adaptive algorithms and stochastic approximations

[5]

Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st

[6]

Borkar V. S., 1996, APPL MATH, V24, P169

[7] Asynchronous stochastic approximations [J].

Borkar, VS .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1998, 36 (03) :840-851

[8] Stochastic approximation with two time scales [J].

Borkar, VS .

SYSTEMS & CONTROL LETTERS, 1997, 29 (05) :291-294

[9] An analog scheme for fixed point computation .1. Theory [J].

Borkar, VS ;

Soumyanath, K .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 1997, 44 (04) :351-355

[10] ON POSITIVE HARRIS RECURRENCE OF MULTICLASS QUEUEING NETWORKS: A UNIFIED APPROACH VIA FLUID LIMIT MODELS [J].

Dai, J. G. .

ANNALS OF APPLIED PROBABILITY, 1995, 5 (01) :49-77

← 1 2 3 →