We consider a scenario where a team of two unmanned aerial vehicles (UAVs) pursue an evader UAV within an urban environment. Each agent has a limited view of their environment where buildings can occlude their field-of-view. Additionally, the pursuer team is agnostic about the evader in terms of its initial and final location, and the behavior of the evader. Consequently, the team needs to gather information by searching the environment and then track it to eventually intercept. To solve this multi-player, partially-observable, pursuit-evasion game, we develop a two-phase neuro-symbolic algorithm centered around the principle of bounded rationality. First, we devise an offline approach using deep reinforcement learning to progressively train adversarial policies for the pursuer team against fictitious evaders. This creates $k$-levels of rationality for each agent in preparation for the online phase. Then, we employ an online classification algorithm to determine a "best guess" of our current opponent from the set of iteratively-trained strategic agents and apply the best player response. Using this schema, we improved average performance when facing a random evader in our environment.
翻译:我们研究了一种场景:由两架无人机组成的团队在城市环境中追捕一架规避无人机。每个智能体对其环境具有有限的视野,建筑物可能遮挡其视场。此外,追捕团队对规避者的初始与最终位置及其行为模式均未知。因此,团队需要通过搜索环境来收集信息,随后进行跟踪并最终实现拦截。为解决这一多参与者、部分可观测的追逃博弈,我们基于有限理性原则开发了一种两阶段神经符号算法。首先,我们设计了一种离线方法,利用深度强化学习逐步训练追捕团队对抗虚构规避者的对抗策略。这为每个智能体在在线阶段准备了$k$层理性等级。随后,我们采用在线分类算法从迭代训练的策略智能体集合中确定当前对手的“最佳猜测”,并应用最优玩家响应策略。通过该框架,我们在面对环境中随机规避者时提升了平均性能。