首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于强化学习的全电推进卫星变轨优化方法
引用本文:韩明仁,王玉峰.基于强化学习的全电推进卫星变轨优化方法[J].系统工程与电子技术,2022,44(5):1652-1661.
作者姓名:韩明仁  王玉峰
作者单位:1. 北京控制工程研究所, 北京 1000942. 空间智能控制技术重点实验室, 北京 100094
基金项目:国家自然科学基金(11502017)
摘    要:采用电推力器实现自主轨道转移是全电推进卫星领域的关键技术之一。针对地球同步轨道(geostationary orbit, GEO)全电推进卫星的轨道提升问题, 将广义优势估计(generalized advantage estimator, GAE)和近端策略优化(proximal policy optimization, PPO)方法相结合, 在考虑多种轨道摄动影响以及地球阴影约束的情况下, 提出了基于强化学习的时间最优小推力变轨策略优化方法。针对状态空间过大、奖励稀疏导致训练困难这一关键问题, 提出了动作输出映射和分层奖励等训练加速方法, 有效提升了训练效率, 加快了收敛速度。数值仿真和结果对比表明, 所提方法更加简单、灵活、高效, 与传统的直接法、间接法以及反馈控制法相比,能够保证轨道转移时间的最优性。

关 键 词:全电推进卫星  小推力变轨优化  强化学习  近端策略优化  训练加速方法  
收稿时间:2021-07-09

Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning
Mingren HAN,Yufeng WANG.Optimization method for orbit transfer of all-electric propulsion satellite based on reinforcement learning[J].System Engineering and Electronics,2022,44(5):1652-1661.
Authors:Mingren HAN  Yufeng WANG
Institution:1. Beijing Institute of Control Engineering, Beijing 100094, China2. Science and Technology on Space Intelligent Control Laboratory, Beijing 100094, China
Abstract:Using electric thrusters for autonomous orbit transfer is one of the critical technologies in the field of all-electric propulsion satellites. In order to solve the orbit raising problem of all-electric propulsion geostationary orbit (GEO) satellites, a reinforcement learning-based optimization method for the time-optimal low-thrust orbit transfer strategy is formulated by combining generalized advantage estimator (GAE) and proximal policy optimization (PPO) methods, taking into account the influence of multiple orbital perturbations and the constraints of the earth's shadow. Aiming at the key problem of training difficulty caused by too large state space and sparse reward, training acceleration methods such as action output mapping and hierarchical reward are proposed, which effectively improve the training efficiency and accelerate the convergence speed. Through numerical simulation and comparison of the results with the direct method, the indirect method and the feedback control method, it shows that the optimization method based on reinforcement learning is more simple, flexible, efficient, and time-optimal in orbit transfer.
Keywords:all-electric propulsion satellite  low-thrust orbit transfer optimization  reinforcement learning  proximal policy optimization (PPO)  training acceleration method  
点击此处可从《系统工程与电子技术》浏览原始摘要信息
点击此处可从《系统工程与电子技术》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号