首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度强化学习的水下机器人最优轨迹控制
引用本文:马琼雄, 余润笙, 石振宇, 黄晁星, 李腾龙. 基于深度强化学习的水下机器人最优轨迹控制[J]. 华南师范大学学报(自然科学版), 2018, 50(1): 118-123.
作者姓名:马琼雄  余润笙  石振宇  黄晁星  李腾龙
作者单位:1.1.华南师范大学信息光电子科技学院
基金项目:广东省科技计划项目;国家级大学生创新创业训练计划项目;华南师范大学青年教师科研培育基金项目
摘    要:为了实现水下机器人在跟踪复杂轨迹时具有出较高的精度和稳定性,提出了利用深度强化学习实现水下机器人最优轨迹控制的方法:首先,建立基于2个深度神经网络(Actor网络和Critic网络)的水下机器人控制模型,其中Actor网络用来选择动作,Critic网络用来评估Actor网络的训练结果;其次,构造合适的奖励信号使得深度强化学习算法适用于水下机器人的动力学模型;最后,提出了基于奖励信号标准差的网络训练成功评判条件,使得水下机器人在确保精度的同时保证稳定性. 仿真实验结果表明:在解决一定精度内的水下机器人复杂轨迹跟踪问题时,该算法与传统PID控制算法相比具有更优的性能.

关 键 词:运动控制
收稿时间:2017-04-05
修稿时间:2017-07-31

Autonomous Underwater Vehicles Optimal Trajectory Control Base on Deep Reinforcement Learning
Autonomous Underwater Vehicles Optimal Trajectory Control Base on Deep Reinforcement Learning[J]. Journal of South China Normal University (Natural Science Edition), 2018, 50(1): 118-123.
Abstract:To enable the autonomous underwater vehicles (AUV) to show high accuracy and stability in tracking complex trajectory, an AUV optimal trajectory tracking method is proposed by using deep reinforcement learning. Firstly, the control model is built based on the Actor deep neural network and the Critic deep neural network. The Actor network is trained to adapt action and the Critic network is trained to evaluate the training outcome of the Actor network. Secondly, proper reward function is constructed to make the deep reinforcement learning algorithm feasibly in underwater vehicles dynamics model. Lastly, the judgment of successful networks training is a set based on the standard deviation of reward functions to ensure the stability of AUV within certain accuracy. Simulations are carried out and we prove that this algorithm performance better than PID control in trajectory tracking in a complex trajectory.
Keywords:motion control
本文献已被 CNKI 等数据库收录!
点击此处可从《华南师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《华南师范大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号