分层强化学习研究综述

被引：5

作者：

沈晶

顾国昌

刘海波

机构：

[1] 哈尔滨工程大学计算机科学与技术学院

来源：

模式识别与人工智能 | 2005年 / 18卷 / 05期

关键词：

分层强化学习; 半马氏过程; Q-学习; 多智能体系统;

D O I：

暂无

中图分类号：

TP181 [自动推理、机器学习];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

强化学习通过试错与环境交互获得策略的改进,其自学习和在线学习的特点使其成为机器学习研究的一个重要分支。但是,强化学习一直被"维数灾"问题所困扰。近年来,分层强化学习方法引入抽象(Abstraction)机制,在克服"维数灾"方面取得了显著进展。作为理论基础,本文首先介绍了强化学习的基本原理及基于半马氏过程的Q-学习算法,然后介绍了3种典型的单Agent分层强化学习方法(Option、HAM和MAXQ)的基本思想,Q-学习更新公式,概括了各方法的本质特征,并对这3种方法进行了对比分析评价。最后指出了将单Agent分层强化学习方法拓展到多Agent分层强化学习时需要解决的问题。

引用

页码：574 / 581

页数：8

共 10 条

[1]

强化学习理论及应用[M]. 哈尔滨工程大学出版社 , 张汝波编著, 2001

[2]

Recent Advances in Hierarchical Reinforcement Learning[J] . Andrew G. Barto,Sridhar Mahadevan.Discrete Event Dynamic Systems . 2003 (1)

[3] Convergence results for single-step on-policy reinforcement-learning algorithms [J].

Singh, S ;

Jaakkola, T ;

Littman, ML ;

Szepesvári, C .

MACHINE LEARNING, 2000, 38 (03) :287-308

[4]

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning[J] . Richard S. Sutton,Doina Precup,Satinder Singh.Artificial Intelligence . 1999 (1)

[5] Solving semi-Markov decision problems using average reward reinforcement learning [J].

Das, TK ;

Gosavi, A ;

Mahadevan, S ;

Marchalleck, N .

MANAGEMENT SCIENCE, 1999, 45 (04) :560-574

[6]

A feedback control structure for on-line learning tasks[J] . Manfred Huber,Roderic A. Grupen.Robotics and Autonomous Systems . 1997 (3)

[7] The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces [J].

Moore, AW ;

Atkeson, CG .

MACHINE LEARNING, 1995, 21 (03) :199-233

[8]

Q -learning[J] . Christopher J. C. H. Watkins,Peter Dayan.Machine Learning . 1992 (3)

[9]

Learning to predict by the methods of temporal differences[J] . Richard S. Sutton.Machine Learning . 1988 (1)

[10]

The Complexity of Decentralized Control of Markov Decision Processes .2 Bernstein D,Zilberstein S,Immerman N. Proc of the 16th Conference on Uncertainty in Artificial Intelligence . 2000

← 1 →