平均奖励分布鲁棒马尔可夫博弈 (Distributionally Robust Markov Games with Average Reward)

We study distributionally robust Markov games (DR-MGs) with the average-reward criterion, a crucial framework for multi-agent decision-making under uncertainty over extended horizons. We first establish a connection between the best-response policies and the optimal policies for the induced single-agent problems. Under a standard irreducible assumption, we derive a correspondence between the optimal policies and the solutions of the robust Bellman equation, and derive the existence of stationary Nash Equilibrium (NE) based on these results. We also study a more general weakly communicating setting. We construct a set-valued map and show its value is a subset of the best-response policies, convex and upper hemi-continuous, which imply the existence of NE. We then introduce Robust Nash-Iteration, and provide convergence guarantees. Finally, we connect average-reward NE to discounted robust equilibria, showing approximation as the discount factor approaches one. Our studies provide comprehensive theoretical and algorithmic foundation for decision-making in complex, uncertain, and long-running multi-player environments.

翻译：本文研究具有平均奖励准则的分布鲁棒马尔可夫博弈（DR-MGs），这是长期不确定性下多智能体决策的关键框架。我们首先建立了最佳响应策略与诱导单智能体问题最优策略之间的联系。在标准不可约假设下，推导了最优策略与鲁棒贝尔曼方程解之间的对应关系，并基于这些结果证明了平稳纳什均衡（NE）的存在性。我们还研究了更一般的弱通信设定。通过构造集值映射，证明其值域是最佳响应策略的子集，且具有凸性与上半连续性，从而推导出NE的存在性。随后提出鲁棒纳什迭代算法，并提供收敛性保证。最后，我们将平均奖励NE与折扣鲁棒均衡相联系，证明当折扣因子趋近于1时二者可相互逼近。本研究为复杂、不确定、长期运行的多玩家环境中的决策问题提供了完整的理论与算法基础。