首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Q-learning强化学习制导律
引用本文:张秦浩,敖百强,张秦雪.Q-learning强化学习制导律[J].系统工程与电子技术,2020,42(2):414-419.
作者姓名:张秦浩  敖百强  张秦雪
作者单位:1. 北京电子工程总体研究所, 北京 1008542. 北华航天工业学院计算机学院, 河北 廊坊 065000
基金项目:中国博士后科学基金资助课题(2017M620863)
摘    要:在未来的战场中,智能导弹将成为精确有效的打击武器,导弹智能化已成为一种主要的发展趋势。本文以传统的比例制导律为基础,提出基于强化学习的变比例系数制导算法。该算法以视线转率作为状态,依据脱靶量设计奖励函数,并设计离散化的行为空间,为导弹选择正确的制导指令。实验仿真验证了所提算法比传统的比例制导律拥有更好的制导精度,并使导弹拥有了自主决策能力。

关 键 词:比例制导  制导律  脱靶量  机动目标  强化学习  Q学习  时序差分算法  
收稿时间:2019-07-26

Reinforcement learning guidance law of Q-learning
Qinhao ZHANG,Baiqiang AO,Qinxue ZHANG.Reinforcement learning guidance law of Q-learning[J].System Engineering and Electronics,2020,42(2):414-419.
Authors:Qinhao ZHANG  Baiqiang AO  Qinxue ZHANG
Institution:1. Beijing Institute of Electronic Engineering, Beijing 100854, China2. College of Computer Science, North China Institute of Aerospace Engineering, Langfang 065000, China
Abstract:As the intelligent missile being a major development trend, it is foreseeable that it will become a precise and effective strike weapon in the future battlefields. On the basis of the traditional proportional guidance law, this paper proposes a guidance algorithm based on reinforcement learning with variable proportional coefficient. Taking the line-of-sight rate as the state, this algorithm designs a discretized action space, as well as a reward function based on the miss distance, to determine the correct guidance command for the missile. The simulation results prove the algorithm possesses better guidance accuracy than the traditional proportional guidance law and endows the missile with the ability of autonomous decision-making.
Keywords:proportional guidance  guidance law  miss distance  maneuvering target  reinforcement learning  Q-learning  timing difference algorithm  
点击此处可从《系统工程与电子技术》浏览原始摘要信息
点击此处可从《系统工程与电子技术》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号