Policy gradient reinforcement learning for fast quadrupedal locomotion

被引:278
作者
Kohl, N [1 ]
Stone, P [1 ]
机构
[1] Univ Texas, Dept Comp Sci, Austin, TX 78712 USA
来源
2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS | 2004年
关键词
learning control; walking robots; multi legged robots;
D O I
10.1109/ROBOT.2004.1307456
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 [计算机科学与技术];
摘要
This paper presents a machine learning approach to optimizing a quadrupedal trot gait for forward speed. Given a parameterized walk designed for a specific robot, we propose using a form of policy gradient reinforcement learning to automatically search the set of possible parameters with the goal of finding the fastest possible walk. We implement and test our approach on a commercially available quadrupedal robot platform, namely the Sony Aibo robot. After about three hours of learning, all on the physical robots and with no human intervention other than to change the batteries, the robots achieved a gait faster than any previously known gait known for the Aibo, significantly outperforming a variety of existing hand-coded and learned solutions.
引用
收藏
页码:2619 / 2624
页数:6
相关论文
共 17 条
[1]
BAGNELL JA, 2001, INT C ROBOTICS AUTOM
[2]
Infinite-horizon policy-gradient estimation [J].
Baxter, J ;
Bartlett, PL .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 :319-350
[3]
HENGST B, 2001, LECT NOTES ARTIF INT, V2377, P368
[4]
Hornby G. S., 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), P3040, DOI 10.1109/ROBOT.2000.846489
[5]
Hornby GS, 1999, GECCO-99: PROCEEDINGS OF THE GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, P1297
[6]
KIM MS, 2003, AUSTR C ROT AUT BRIS
[7]
NY AY, 2004, IN PRESS ADV NEURAL, V17
[8]
Press W., 1993, Numerical recipes, V2nd
[9]
QUINLAN MJ, 2003, AUSTR C ROB AUT BRIS
[10]
ROFER T, 2003, GERMANTEAM ROBOCUP 2