首页 | 本学科首页   官方微博 | 高级检索  
     

改进Q学习算法在作业车间调度问题中的应用
引用本文:赵也践,王艳红,张俊,于洪霞,田中大. 改进Q学习算法在作业车间调度问题中的应用[J]. 系统仿真学报, 2022, 34(6): 1247-1258. DOI: 10.16182/j.issn1004731x.joss.21-0099
作者姓名:赵也践  王艳红  张俊  于洪霞  田中大
作者单位:沈阳工业大学 人工智能学院,辽宁 沈阳 110027
基金项目:国家自然科学基金(61803273);辽宁省重点研发计划(2020JH2/10100041)
摘    要:为解决动态环境下作业车间调度问题,提出了一种基于改进Q学习算法和调度规则的动态调度算法。以“剩余任务紧迫程度”的概念来描述动态调度算法的状态空间;设计了以“松弛越高,惩罚越高”为宗旨的回报函数;通过引入以Softmax函数为主体的动作选择策略来改进传统的Q学习算法,使改进后的Q学习算法在前期选择不同动作的概率更加平等,同时改善了贪婪策略在学习后期还会选择次优动作的现象。仿真结果表明:该调度算法相较于改进前,性能指标平均提升约6.5%;相较于IPSO算法和PSO算法,性能指标平均提升分别约为38.3%和38.9%,调度结果明显优于使用单一调度规则以及传统优化算法等常规方法。

关 键 词:强化学习  Q学习  调度规则  动态调度  作业车间调度
收稿时间:2021-02-02

Application of Improved Q Learning Algorithm in Job Shop Scheduling Problem
Yejian Zhao,Yanhong Wang,Jun Zhang,Hongxia Yu,Zhongda Tian. Application of Improved Q Learning Algorithm in Job Shop Scheduling Problem[J]. Journal of System Simulation, 2022, 34(6): 1247-1258. DOI: 10.16182/j.issn1004731x.joss.21-0099
Authors:Yejian Zhao  Yanhong Wang  Jun Zhang  Hongxia Yu  Zhongda Tian
Affiliation:School of Artificial Intelligence, Shenyang University of Technology, Shenyang 110027, China
Abstract:Aiming at the job shop scheduling in a dynamic environment, a dynamic scheduling algorithm based on an improved Q learning algorithm and dispatching rules is proposed. The state space of the dynamic scheduling algorithm is described with the concept of "the urgency of remaining tasks" and a reward function with the purpose of "the higher the slack, the higher the penalty" is disigned. In view of the problem that the greedy strategy will select the sub-optimal actions in the later stage of learning, the traditional Q learning algorithm is improved by introducing an action selection strategy based on the "softmax" function, which makes the improved Q learning algorithm more equal in the probability of selecting different actions in the early stage. The simulation results obtained from 6 different test instances show that the performance indicator of the scheduling algorithm is improved by an average of about 6.5% compared to the before and by about 38.3% and 38.9% respectively compared with the IPSO algorithm and PSO algorithm. The indicator is significantly better than conventional methods such as using a single dispatching rule and traditional optimization algorithms.
Keywords:reinforcement learning  Q learning  dispatching rules  dynamic scheduling  job shop scheduling  
点击此处可从《系统仿真学报》浏览原始摘要信息
点击此处可从《系统仿真学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号