Abstract:
The patrol path planning has been simplified with state-of-art algorithm into two-person game in grid world, ignoring the existence of attackers. In order to deal with the problem of realistic patrol path planning, a novel multi-agent reinforcement learning algorithm was proposed. An optimum patrol path was planned in a circumstance that multiple defenders and attackers formed the multi-target configuration. Considering the asynchronism of the actions taken by many defender and attacker, a strong Stackelberg equilibrium was taken as the action selection of players in the proposed algorithm. To verify the proposed algorithm, several patrol missions were tested. The qualitative and quantitative test results prove the convergence and effectiveness of the algorithm.