国家-横向多机构强化学习的解决方案是什么? (What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?)

Various types of Multi-Agent Reinforcement Learning (MARL) methods have been developed, assuming that agents' policies are based on true states. Recent works have improved the robustness of MARL under uncertainties from the reward, transition probability, or other partners' policies. However, in real-world multi-agent systems, state estimations may be perturbed by sensor measurement noise or even adversaries. Agents' policies trained with only true state information will deviate from optimal solutions when facing adversarial state perturbations during execution. MARL under adversarial state perturbations has limited study. Hence, in this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to study the fundamental properties of MARL under state uncertainties. We prove that the optimal agent policy and the robust Nash equilibrium do not always exist for an SAMG. Instead, we define the solution concept, robust agent policy, of the proposed SAMG under adversarial state perturbations, where agents want to maximize the worst-case expected state value. We then design a gradient descent ascent-based robust MARL algorithm to learn the robust policies for the MARL agents. Our experiments show that adversarial state perturbations decrease agents' rewards for several baselines from the existing literature, while our algorithm outperforms baselines with state perturbations and significantly improves the robustness of the MARL policies under state uncertainties.

翻译：已经开发了多种类型的多机构强化学习方法,假设代理商的政策以真实状态为基础; 最近的工作提高了MARL在奖励、过渡概率或其他伙伴政策的不确定性下是否稳健; 然而,在现实世界的多试剂系统中,国家估计可能受到传感器测量噪音甚至对手的干扰。仅以真实状态信息培训的代理商政策在执行期间面临敌对状态干扰时将偏离最佳解决方案。敌对状态干扰下的MARL研究有限。因此,在这项工作中,我们提议建立一个国家-Aversarial Markov游戏(SAMG),并首次尝试在国家不确定性下研究MARL的基本特性。我们证明,对于SAMG来说,最佳代理政策和稳健的纳什平衡并不总是存在。相反,我们界定了拟议SAMG在对抗状态下的解决办法概念、稳健的代理商政策,即代理商希望最大限度地达到最坏的预期状态值。我们随后设计了一种以指数为基础的基缩缩基的MARL算法,以便从若干敌对性基线上测试我们的国家的实验室的稳健的实验室,同时显示我们现有的基准实验室的实验室。