A Distributed Cooperative Dynamic Task Planning Algorithm for Multiple Satellites Based on Multi-agent Hybrid Learning

被引:48
作者
Wang Chong [1 ]
Li Jun [1 ]
Jing Ning [1 ]
Wang Jun [1 ]
Chen Hao [1 ]
机构
[1] Natl Univ Def Technol, Coll Elect Sci & Engn, Changsha 410073, Hunan, Peoples R China
关键词
multiple satellites dynamic task planning problem; multi-agent systems; reinforcement learning; neuroevolution of augmenting topologies; transfer learning;
D O I
10.1016/S1000-9361(11)60057-5
中图分类号
V [航空、航天];
学科分类号
082501 [飞行器设计];
摘要
Traditionally, heuristic re-planning algorithms are used to tackle the problem of dynamic task planning for multiple satellites. However, the traditional heuristic strategies depend on the concrete tasks, which often affect the result's optimality. Noticing that the historical information of cooperative task planning will impact the latter planning results, we propose a hybrid learning algorithm for dynamic multi-satellite task planning, which is based on the multi-agent reinforcement learning of policy iteration and the transfer learning. The reinforcement learning strategy of each satellite is described with neural networks. The policy neural network individuals with the best topological structure and weights are found by applying co-evolutionary search iteratively. To avoid the failure of the historical learning caused by the randomly occurring observation requests, a novel approach is proposed to balance the quality and efficiency of the task planning, which converts the historical learning strategy to the current initial learning strategy by applying the transfer learning algorithm. The simulations and analysis show the feasibility and adaptability of the proposed approach especially for the situation with randomly occurring observation requests.
引用
收藏
页码:493 / 505
页数:13
相关论文
共 27 条
[1]
*AN GRAPH INC, SAT TOOL KIT 8 1 US
[2]
[Anonymous], 1998, ADAPTIVE COMPUTATION
[3]
[Anonymous], 2001, Intelligent optimization algorithm and its application
[4]
A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[5]
CHEN H, 2009, THESIS NAT U DEFENSE
[6]
CHRISTOPHER JCH, 1989, THESIS KINGS COLL
[7]
DANIEL SB, 2000, P 16 C UNC ART INT, P32
[8]
DAVID EM, 1999, J ARTIFICIAL INTELLI, V11, P199
[9]
Policy gradient reinforcement learning for fast quadrupedal locomotion [J].
Kohl, N ;
Stone, P .
2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, :2619-2624
[10]
Kramer L., 2003, P 18 INT JOINT C ART, P1218