Feature Selection and Feature Learning for High-dimensional Batch Reinforcement Learning: A Survey

被引：3

作者：

De-Rong Liu

Hong-Liang Li

Ding Wang

机构：

[1] StateKeyLaboratoryofManagementandControlforComplexSystems,InstituteofAutomation,ChineseAcademyofSciences

来源：

International Journal of Automation and Computing | 2015年 / 12卷 / 03期

关键词：

Intelligent control; reinforcement learning; adaptive dynamic programming; feature selection; feature learning; big data;

D O I：

暂无

中图分类号：

TP181 [自动推理、机器学习]; TP391.4 [模式识别与装置];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ; 0811 ; 081101 ;

摘要：

Tremendous amount of data are being generated and saved in many complex engineering and social systems every day.It is significant and feasible to utilize the big data to make better decisions by machine learning techniques. In this paper, we focus on batch reinforcement learning(RL) algorithms for discounted Markov decision processes(MDPs) with large discrete or continuous state spaces, aiming to learn the best possible policy given a fixed amount of training data. The batch RL algorithms with handcrafted feature representations work well for low-dimensional MDPs. However, for many real-world RL tasks which often involve high-dimensional state spaces, it is difficult and even infeasible to use feature engineering methods to design features for value function approximation. To cope with high-dimensional RL problems, the desire to obtain data-driven features has led to a lot of works in incorporating feature selection and feature learning into traditional batch RL algorithms. In this paper, we provide a comprehensive survey on automatic feature selection and unsupervised feature learning for high-dimensional batch RL. Moreover, we present recent theoretical developments on applying statistical learning to establish finite-sample error bounds for batch RL algorithms based on weighted Lpnorms. Finally, we derive some future directions in the research of RL algorithms, theories and applications.

引用

页码：229 / 242

页数：14

共 10 条

[1]

Approximate policy iteration:a survey and somenew methods[J]. Dimitri P.BERTSEKAS.Journal of Control Theory and Applications. 2011(03)

[2]

An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs[J] . Derong Liu,Ding Wang,Xiong Yang.Information Sciences . 2013

[3] Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming [J].

Wang, Ding ;

Liu, Derong ;

Wei, Qinglai ;

Zhao, Dongbin ;

Jin, Ning .

AUTOMATICA, 2012, 48 (08) :1825-1832

[4] Model selection in reinforcement learning [J].

Farahmand, Amir-massoud ;

Szepesvari, Csaba .

MACHINE LEARNING, 2011, 85 (03) :299-332

[5] Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path [J].

Antos, Andras ;

Szepesvari, Csaba ;

Munos, Remi .

MACHINE LEARNING, 2008, 71 (01) :89-129

[6]

Least Squares Policy Evaluation Algorithms with Linear Function Approximation[J] . Discrete Event Dynamic Systems . 2003 (1)

[7]

Kernel-Based Reinforcement Learning[J] . Machine Learning . 2002 (2)

[8]

Technical Update: Least-Squares Temporal Difference Learning[J] . Justin A. Boyan.Machine Learning . 2002 (2)

[9]

Linear Least-Squares Algorithms for Temporal Difference Learning[J] . Steven J. Bradtke,Andrew G. Barto.Machine Learning . 1996 (1)

[10]

Sparse temporal difference learning using LASSO .2 M. Loth,M. Davy,P. Preux. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning . 2007

← 1 →