首页 | 本学科首页   官方微博 | 高级检索  
     

基于深度强化学习分层控制的双足机器人多模式步态系统研究
引用本文:徐毓松,上官倩芡,安康. 基于深度强化学习分层控制的双足机器人多模式步态系统研究[J]. 上海师范大学学报(自然科学版), 2024, 53(2): 260-267
作者姓名:徐毓松  上官倩芡  安康
作者单位:上海师范大学 信息与机电工程学院, 上海 201418
摘    要:提出一种基于深度强化学习(DRL)分层控制的双足机器人多模式步态生成系统. 首先采用优势型演员-评论家框架作为高级控制策略,引入近端策略优化(PPO)算法、课程学习(CL)思想对策略进行优化,设计比例-微分(PD)控制器为低级控制器;然后定义机器人观测和动作空间进行策略参数化,并根据对称双足行走步态周期性的特点,设计步态周期奖励函数和步进函数;最后通过生成足迹序列,设计多模式任务场景,并在Mujoco仿真平台下验证方法的可行性. 结果表明,本方法能够有效提高双足机器人在复杂环境下行走的稳定性以及泛化性.

关 键 词:双足机器人  步态规划  近端策略优化(PPO)  多模式任务  课程学习(CL)
收稿时间:2023-12-23

Research on multi-mode gait hierarchical control system of biped robot based on hierarchical control of deep reinforcement learning
XU Yusong,SHANGGUAN Qianqian,AN Kang. Research on multi-mode gait hierarchical control system of biped robot based on hierarchical control of deep reinforcement learning[J]. Journal of Shanghai Normal University(Natural Sciences), 2024, 53(2): 260-267
Authors:XU Yusong  SHANGGUAN Qianqian  AN Kang
Affiliation:College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China
Abstract:According to the current research in the application of bipedal robot gait control, there still existed deficiency and challenge related to stability and generalization in complex scenarios. A multi-mode bipedal robot gait generation system based on hierarchical control using deep reinforcement learning (DRL) was proposed. Initially, an advantage-actor-critic framework was employed as the high-level control strategy, integrating proximal policy optimization (PPO) algorithm and the concept of curriculum learning (CL) to optimize the policy. A proportional-differential (PD) controller was designed as the low-level controller. Next, the robot''s observation and action spaces were defined for policy parameterization. Leveraging the cyclic nature of symmetric bipedal walking gaits, a gait cycle reward function and stepping function were devised. Finally, by generating footstep sequences, multiple-mode task scenarios were formulated, and the feasibility of the method was validated using the Mujoco simulation platform. The results demonstrated that the improved approach effectively enhanced the stability and generalization of bipedal robot walking in complex environments.
Keywords:bipedal robot  gait planning  proximal policy optimization(PPO)  multimodal task  course learning(CL)
点击此处可从《上海师范大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《上海师范大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号