基于Stackelberg策略的多Agent强化学习警力巡逻路径规划

Police Patrol Path Planning Using Stackelberg Equilibrium Based Multiagent Reinforcement Learning

摘要: 为解决现有的巡逻路径规划算法仅仅能够处理双人博弈和忽略攻击者存在的问题，提出一种新的基于多agent的强化学习算法.在给定攻击目标分布的情况下，规划任意多防御者和攻击者条件下的最优巡逻路径.考虑到防御者与攻击者选择策略的非同时性，采用了Stackelberg强均衡策略作为每个agent选择策略的依据.为了验证算法，在多个巡逻任务中进行了测试.定量和定性的实验结果证明了算法的收敛性和有效性.

Abstract: The patrol path planning has been simplified with state-of-art algorithm into two-person game in grid world, ignoring the existence of attackers. In order to deal with the problem of realistic patrol path planning, a novel multi-agent reinforcement learning algorithm was proposed. An optimum patrol path was planned in a circumstance that multiple defenders and attackers formed the multi-target configuration. Considering the asynchronism of the actions taken by many defender and attacker, a strong Stackelberg equilibrium was taken as the action selection of players in the proposed algorithm. To verify the proposed algorithm, several patrol missions were tested. The qualitative and quantitative test results prove the convergence and effectiveness of the algorithm.