基于探测策略的群体领导者识别研究 (On Swarm Leader Identification using Probing Policies)

Identifying the leader within a robotic swarm is crucial, especially in adversarial contexts where leader concealment is necessary for mission success. This work introduces the interactive Swarm Leader Identification (iSLI) problem, a novel approach where an adversarial probing agent identifies a swarm's leader by physically interacting with its members. We formulate the iSLI problem as a Partially Observable Markov Decision Process (POMDP) and employ Deep Reinforcement Learning, specifically Proximal Policy Optimization (PPO), to train the prober's policy. The proposed approach utilizes a novel neural network architecture featuring a Timed Graph Relationformer (TGR) layer combined with a Simplified Structured State Space Sequence (S5) model. The TGR layer effectively processes graph-based observations of the swarm, capturing temporal dependencies and fusing relational information using a learned gating mechanism to generate informative representations for policy learning. Extensive simulations demonstrate that our TGR-based model outperforms baseline graph neural network architectures and exhibits significant zero-shot generalization capabilities across varying swarm sizes and speeds different from those used during training. The trained prober achieves high accuracy in identifying the leader, maintaining performance even in out-of-training distribution scenarios, and showing appropriate confidence levels in its predictions. Real-world experiments with physical robots further validate the approach, confirming successful sim-to-real transfer and robustness to dynamic changes, such as unexpected agent disconnections.

翻译：在机器人群体中识别领导者至关重要，尤其在对抗性场景中，领导者的隐蔽性对任务成功具有关键作用。本研究提出了交互式群体领导者识别问题，这是一种创新方法，通过对抗性探测智能体与群体成员进行物理交互来识别群体领导者。我们将iSLI问题建模为部分可观测马尔可夫决策过程，并采用深度强化学习（特别是近端策略优化算法）来训练探测器的策略。所提出的方法采用了一种新颖的神经网络架构，该架构结合了时序图关系变换器层与简化结构化状态空间序列模型。TGR层能有效处理基于图的群体观测数据，通过学习的门控机制捕获时间依赖性并融合关系信息，从而为策略学习生成具有信息量的表征。大量仿真实验表明，我们基于TGR的模型优于基准图神经网络架构，并在与训练环境不同的群体规模和速度条件下展现出显著的零样本泛化能力。经过训练的探测器在识别领导者方面实现了高准确率，即使在训练分布之外的场景中仍能保持性能，并在其预测中表现出适当的置信度。通过物理机器人的真实世界实验进一步验证了该方法的有效性，证实了从仿真到现实的成功迁移以及对动态变化（如智能体意外断开连接）的鲁棒性。