首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Stackelberg策略的多Agent强化学习警力巡逻路径规划
引用本文:解易,顾益军.基于Stackelberg策略的多Agent强化学习警力巡逻路径规划[J].北京理工大学学报,2017,37(1):93-99.
作者姓名:解易  顾益军
作者单位:中国人民公安大学 网络安全保卫学院, 北京 100038
基金项目:中国人民公安大学基本科研业务费项目(2014JKF01132)
摘    要:为解决现有的巡逻路径规划算法仅仅能够处理双人博弈和忽略攻击者存在的问题,提出一种新的基于多agent的强化学习算法.在给定攻击目标分布的情况下,规划任意多防御者和攻击者条件下的最优巡逻路径.考虑到防御者与攻击者选择策略的非同时性,采用了Stackelberg强均衡策略作为每个agent选择策略的依据.为了验证算法,在多个巡逻任务中进行了测试.定量和定性的实验结果证明了算法的收敛性和有效性. 

关 键 词:巡逻路线规划    Stackelberg强均衡策略    多agent    强化学习
收稿时间:2015/4/15 0:00:00

Police Patrol Path Planning Using Stackelberg Equilibrium Based Multiagent Reinforcement Learning
XIE Yi and GU Yi-jun.Police Patrol Path Planning Using Stackelberg Equilibrium Based Multiagent Reinforcement Learning[J].Journal of Beijing Institute of Technology(Natural Science Edition),2017,37(1):93-99.
Authors:XIE Yi and GU Yi-jun
Institution:Department of Cyber Security, People's Public Security University of China, Beiing 100038, China
Abstract:The patrol path planning has been simplified with state-of-art algorithm into two-person game in grid world, ignoring the existence of attackers. In order to deal with the problem of realistic patrol path planning, a novel multi-agent reinforcement learning algorithm was proposed. An optimum patrol path was planned in a circumstance that multiple defenders and attackers formed the multi-target configuration. Considering the asynchronism of the actions taken by many defender and attacker, a strong Stackelberg equilibrium was taken as the action selection of players in the proposed algorithm. To verify the proposed algorithm, several patrol missions were tested. The qualitative and quantitative test results prove the convergence and effectiveness of the algorithm.
Keywords:patrol path planning  strong Stackelberg equilibrium  multiagent  reinforcement learning
本文献已被 CNKI 等数据库收录!
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号