Transfer in variable-reward hierarchical reinforcement learning

被引：46

作者：

Mehta, Neville ^{[1
]}

Natarajan, Sriraam ^{[1
]}

Tadepalli, Prasad ^{[1
]}

Fern, Alan ^{[1
]}

机构：

[1] Oregon State Univ, Sch Elect Engn & Comp Sci, Corvallis, OR 97330 USA

来源：

MACHINE LEARNING | 2008年 / 73卷 / 03期

关键词：

Hierarchical reinforcement learning; Transfer learning; Average-reward learning; Multi-criteria learning;

D O I：

10.1007/s10994-008-5061-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transfer learning seeks to leverage previously learned tasks to achieve faster learning in a new task. In this paper, we consider transfer learning in the context of related but distinct Reinforcement Learning (RL) problems. In particular, our RL problems are derived from Semi-Markov Decision Processes (SMDPs) that share the same transition dynamics but have different reward functions that are linear in a set of reward features. We formally define the transfer learning problem in the context of RL as learning an efficient algorithm to solve any SMDP drawn from a fixed distribution after experiencing a finite number of them. Furthermore, we introduce an online algorithm to solve this problem, Variable-Reward Reinforcement Learning (VRRL), that compactly stores the optimal value functions for several SMDPs, and uses them to optimally initialize the value function for a new SMDP. We generalize our method to a hierarchical RL setting where the different SMDPs share the same task hierarchy. Our experimental results in a simplified real-time strategy domain show that significant transfer learning occurs in both flat and hierarchical settings. Transfer is especially effective in the hierarchical setting where the overall value functions are decomposed into subtask value functions which are more widely amenable to transfer across different SMDPs.

引用

页码：289 / 312

页数：24

共 24 条

[1]

Abbeel P., 2004, P ICML

[2]

Andre D, 2002, EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, P119

[3] Hierarchical reinforcement learning with the MAXQ value function decomposition [J].

Dietterich, TG .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 :227-303

[4] CONSTRAINED MARKOV DECISION-MODELS WITH WEIGHTED DISCOUNTED REWARDS [J].

FEINBERG, EA ;

SHWARTZ, A .

MATHEMATICS OF OPERATIONS RESEARCH, 1995, 20 (02) :302-320

[5]

GABOR Z, 1998, P ICML

[6]

GUESTRIN C, 2001, P NIPS 01

[7]

Guestrin Carlos, 2003, P 18 INT JOINT C ART, P1003

[8]

KAELBLING L, 1998, AI J

[9]

Liu Y., 2006, P AAAI C ART INT, P421

[10]

MAUSAM D, 2003, P ICAPS WORKSH PLANN

← 1 2 3 →