基于LSTM-PPO算法的无人作战飞机近距空战机动决策 Maneuvering Decision of UCAV in Close Air Combat Based on LSTM-PPO Algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于LSTM-PPO算法的无人作战飞机近距空战机动决策

引用本文：	丁维,王渊,丁达理,磊,周欢,谭目来,吕丞辉.基于LSTM-PPO算法的无人作战飞机近距空战机动决策[J].空军工程大学学报,2022,23(3):19-25.

作者姓名：	丁维王渊丁达理磊周欢谭目来吕丞辉

作者单位：	空军工程大学航空工程学院，西安，710038

基金项目：	陕西省自然科学基金（2020JQ-481）

摘要：	近距空战中环境复杂、格斗态势高速变化，基于对策理论的方法因数据迭代量大而不能满足实时性要求，基于数据驱动的方法存在训练时间长、执行效率低的问题。对此，提出了一种基于深度强化学习算法的UCAV近距空战机动决策方法。首先，在UCAV三自由度模型的基础上构建飞行驱动模块，形成状态转移更新机制；然后在近端策略优化算法的基础上加入Ornstein-Uhlenbeck随机噪声以提高UCAV对未知状态空间的探索能力，结合长短时记忆网络（LSTM）增强对序列样本数据的学习能力，提升算法的训练效率和效果。最后通过设计3组近距空战仿真实验，并与PPO算法作性能对比，验证所提方法的有效性和优越性。
关键词：	无人作战飞机空战机动决策深度强化学习近谝策略伏化长短时记忆网络
Maneuvering Decision of UCAV in Close Air Combat Based on LSTM-PPO Algorithm

DING Wei,WANG Yuan,DING Dali,XIE Lei,ZHOU Huan,TAN Mulai,LYU Chenghui.Maneuvering Decision of UCAV in Close Air Combat Based on LSTM-PPO Algorithm[J].Journal of Air Force Engineering University(Natural Science Edition),2022,23(3):19-25.

Authors:	DING Wei WANG Yuan DING Dali XIE Lei ZHOU Huan TAN Mulai LYU Chenghui

Abstract:	With the increasing military application of unmanned combat aircraft (UCAV), unmanned combat will become the main combat mode in the future air battlefield. In close range air combat, the environment is complex and the combat situation changes rapidly. The method based on game theory cannot meet the real time requirements due to the large amount of data iteration, and the data driven method has the problems of long training time and low execution efficiency. To solve this problem, a UCAV maneuver decision method based on deep reinforcement learning algorithm is proposed in this paper. Firstly, the flight drive module is constructed on the basis of UCAV three degree of freedom model to form the state transition updating mechanism. Then, on the basis of PPO algorithm, ornstein uhlenbeck (OU) random noise was added to improve UCAV''s ability to explore unknown state space, and LSTM was combined to enhance UCAV''s ability to learn sequence sample data, so as to improve the training efficiency and effect of the algorithm. Finally, the effectiveness and superiority of the proposed method are verified by designing three groups of close range air combat simulation experiments and comparing the performance with PPO algorithm.

Keywords:

	点击此处可从《空军工程大学学报》浏览原始摘要信息
	点击此处可从《空军工程大学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏