No-Regret学习动态收敛问题的研究 (On the Convergence of No-Regret Learning Dynamics in Time-Varying Games)

Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also apply to time-varying general-sum multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games.

翻译：大多数有关博弈中的学习文献都集中于限制性的设置，即基础重复游戏随时间不变。关于动态多智能体环境中No-Regret学习算法的收敛性，我们知之甚少。在本文中，我们描述了乐观梯度下降法（OGD）在时间变化游戏中的收敛性质。我们的框架针对OGD在以游戏序列的自然变化度量为参数的零和游戏中的均衡差提供了尖锐的收敛界限，包括静态游戏中已知的结果。此外，我们还建立起强凸-强凹的较好二阶变化界限，只要每个游戏重复多次。由于用相关均衡的二次型配方泛化到时间变化广义和多人游戏，我们的结果对元学习和获取精细变化依赖性遗憾界限始终保持新颖的重要性，解决了之前留下的问题。最后，我们利用我们的框架提供了静态游戏中的动态遗憾保证的新思路。