ASYNCHRONOUS STOCHASTIC-APPROXIMATION AND Q-LEARNING

被引：391

作者：

TSITSIKLIS, JN

机构：

[1] Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA

来源：

MACHINE LEARNING | 1994年 / 16卷 / 03期

关键词：

REINFORCEMENT LEARNING; Q-LEARNING; DYNAMIC PROGRAMMING; STOCHASTIC APPROXIMATION;

D O I：

10.1023/A:1022689125041

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

引用

页码：185 / 202

页数：18

共 15 条

[1]

BARTO AG, 1991, 9157 U MASS COMP SCI

[2] AN ANALYSIS OF STOCHASTIC SHORTEST-PATH PROBLEMS [J].