The target defense problem (TDP) for unmanned surface vehicles (USVs) concerns intercepting an adversarial USV before it breaches a designated target region, using one or more defending USVs. A particularly challenging scenario arises when the attacker exhibits superior maneuverability compared to the defenders, significantly complicating effective interception. To tackle this challenge, this letter introduces ARBoids, a novel adaptive residual reinforcement learning framework that integrates deep reinforcement learning (DRL) with the biologically inspired, force-based Boids model. Within this framework, the Boids model serves as a computationally efficient baseline policy for multi-agent coordination, while DRL learns a residual policy to adaptively refine and optimize the defenders' actions. The proposed approach is validated in a high-fidelity Gazebo simulation environment, demonstrating superior performance over traditional interception strategies, including pure force-based approaches and vanilla DRL policies. Furthermore, the learned policy exhibits strong adaptability to attackers with diverse maneuverability profiles, highlighting its robustness and generalization capability. The code of ARBoids will be released upon acceptance of this letter.
翻译:无人水面艇(USV)的目标防御问题(TDP)涉及使用一艘或多艘防御性USV,在敌方USV突破指定目标区域前进行拦截。当攻击者展现出比防御者更优越的机动性时,会出现一个极具挑战性的场景,这显著增加了有效拦截的难度。为应对这一挑战,本文提出了ARBoids,一种新颖的自适应残差强化学习框架,该框架将深度强化学习(DRL)与受生物启发的、基于力的Boids模型相结合。在此框架中,Boids模型作为多智能体协调的计算高效基线策略,而DRL则学习一个残差策略,以自适应地优化和调整防御者的行动。所提出的方法在高保真Gazebo仿真环境中得到验证,显示出优于传统拦截策略(包括纯基于力的方法和原始DRL策略)的性能。此外,学习到的策略对具有不同机动性特征的攻击者表现出强大的适应能力,突显了其鲁棒性和泛化能力。ARBoids的代码将在本文被接受后发布。