首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向多目标参数整定的协同深度强化学习方法
引用本文:罗森林,魏继勋,刘晓双,潘丽敏.面向多目标参数整定的协同深度强化学习方法[J].北京理工大学学报,2022,42(9):969-975.
作者姓名:罗森林  魏继勋  刘晓双  潘丽敏
作者单位:北京理工大学 信息与电子学院, 北京 100081
摘    要:多目标控制参数联合优化整定是自动化系统保持高效、稳定运行的关键问题,强化学习常用于建立自动化调参智能体,代替人工完成参数整定. 针对现有方法使用固定权重将多个优化目标线性组合为单目标,训练具有固定调参知识的单智能体模型,导致实际目标关系受环境影响与先验不符时,智能体无法感知并做出适应性决策调整,限制参数整定效果的问题,提出一种面向多目标参数整定的协同深度强化学习方法. 该方法利用离线仿真学习目标整定知识建立多个Double-DQN智能体,在线建立整定效果反馈,感知目标实际关系并调整智能体协同策略,实现有效的多目标参数整定. 列车自动驾驶参数整定实验结果表明,方法对停车误差、舒适度两个目标整定效果良好,能自适应不同车轨性能且可持续优化,实用价值大. 

关 键 词:参数整定    多目标    强化学习    自动化系统    协同
收稿时间:2021-08-23

Collaborative Deep Reinforcement Learning Method for Multi-Objective Parameter Tuning
Institution:School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
Abstract:The joint optimization and tuning of multi-objective control parameters is a key issue for the automation system to maintain efficient and stable operation. Reinforcement learning is often used to establish an automated parameter adjustment agent which can replace experts to complete parameter tuning. Existing methods use fixed weights to linearly combine multiple optimization objectives into a single objective and train a single agent model with fixed tuning knowledge, making the actual objective relationship do not match the initialization, the agent can't perceive and make adaptive decision-making adjustments, limiting the effect of parameter tuning. To solve the problem, a collaborative deep reinforcement learning method was proposed for multi-objective parameter tuning. Firstly, an offline simulation was used to learn objective tuning knowledge and to establish multiple Double-DQN agents. Then tuning effect feedback was established online to perceive the actual relationship between the objectives and adjust the agents' coordination strategy to achieve effective multi-objective parameter tuning. The experimental results of automatic train operation parameter tuning show that the proposed method presents better effect on the two goals of parking error and comfort, adapting to different track performance and continue optimization, processing great practical value. 
Keywords:
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号