首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于改进的DDPG算法的蛇形机器人路径规划方法
引用本文:郝崇清,任博恒,赵庆鹏,侯宝帅,白 彤,武晓晶,樊劲辉.基于改进的DDPG算法的蛇形机器人路径规划方法[J].河北科技大学学报,2023,44(2):165-176.
作者姓名:郝崇清  任博恒  赵庆鹏  侯宝帅  白 彤  武晓晶  樊劲辉
作者单位:河北科技大学电气工程学院;南京邮电大学通信与信息工程学院
基金项目:国家自然科学基金(62003129);
摘    要:针对蛇形机器人执行路径规划任务时,面对复杂环境传统强化学习算法出现的训练速度慢、容易陷入死区导致收敛速度慢等问题,提出了一种改进的深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法。首先,在策略-价值(actor-critic)网络中引入多层长短期记忆(long short-term memory, LSTM)神经网络模型,使其控制经验池中信息的记忆和遗忘程度;其次,通过最优化特征参数将CPG(central pattern generators)网络融入强化学习模型,并设计新型网络状态空间和奖励函数;最后,将改进算法与传统算法分别部署在Webots环境中进行仿真实验。结果表明,相比于传统算法,改进算法整体训练时间平均降低了15%,到达目标点迭代次数平均降低了22%,减少了行驶过程中陷入死区的次数,收敛速度也有明显的提升。因此所提算法可以有效地引导蛇形机器人躲避障碍物,为其在复杂环境下执行路径规划任务提供了新的思路。

关 键 词:机器人控制  蛇形机器人  改进的DDPG算法  强化学习  CPG网络  Webots三维仿真
收稿时间:2022/12/21 0:00:00
修稿时间:2023/2/25 0:00:00

Path planning method of snake-like robot based on improved DDPG algorithm
HAO Chongqing,REN Boheng,ZHAO Qingpeng,HOU Baoshuai,BAI Tong,WU Xiaojing,FAN Jinhui.Path planning method of snake-like robot based on improved DDPG algorithm[J].Journal of Hebei University of Science and Technology,2023,44(2):165-176.
Authors:HAO Chongqing  REN Boheng  ZHAO Qingpeng  HOU Baoshuai  BAI Tong  WU Xiaojing  FAN Jinhui
Abstract:Aiming at the problems of low training speed and convergence speed caused by falling into a dead zone of traditional reinforcement learning algorithm of the snake-like robot when performing path planning task in multi-obstacle environment, an improved deep deterministic policy gradient(DDPG) algorithm was proposed. Firstly, a multi-layer long short-term memory (LSTM) neural network model was introduced into the actor-critic network to control the memory and forgetting degree of information in the experience pool; secondly, the CPG(central pattern generators) network was integrated into a reinforcement learning model by optimizing feature parameters, designing new network state space and reward function, finally, The improved algorithm and the traditional algorithm were deployed in Webots environment for simulation experiments.The results show that compared with the traditional algorithm, the overall training time of the improved algorithm is reduced by 15% on average, and the number of iterations to reach the target point is reduced by 22% on average, which reduces the times of falling into the dead zone during driving and obviously improves the convergence speed. The algorithm can effectively guide the snake-like robot to avoid obstacles, thus providing a new idea for its performing path planning task in multi-obstacle environment.
Keywords:robot control  snake-like robot  improved DDPG algorithm  intensive learning  CPG network  Webots 3D simulation
点击此处可从《河北科技大学学报》浏览原始摘要信息
点击此处可从《河北科技大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号