超越时间平均趋同:通过Clairvoyant多倍增效应更新近最佳、不相交的在线学习 (Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update)

In this paper, we provide a novel and simple algorithm, Clairvoyant Multiplicative Weights Updates (CMWU) for regret minimization in general games. CMWU effectively corresponds to the standard MWU algorithm but where all agents, when updating their mixed strategies, use the payoff profiles based on tomorrow's behavior, i.e. the agents are clairvoyant. CMWU achieves constant regret of $\ln(m)/\eta$ in all normal-form games with m actions and fixed step-sizes $\eta$. Although CMWU encodes in its definition a fixed point computation, which in principle could result in dynamics that are neither computationally efficient nor uncoupled, we show that both of these issues can be largely circumvented. Specifically, as long as the step-size $\eta$ is upper bounded by $\frac{1}{(n-1)V}$, where $n$ is the number of agents and $[0,V]$ is the payoff range, then the CMWU updates can be computed linearly fast via a contraction map. This implementation results in an uncoupled online learning dynamic that admits a $o (\log T)$-sparse sub-sequence where each agent experiences at most $O(nV\log m)$ regret. This implies that the CMWU dynamics converge with rate $O(nV \log mW( T) / T)$ to a Coarse Correlated Equilibrium where $W(T)$ is the inverse of the function $g(t):=t\cdot 2^t$. The latter improves on the current state-of-the-art convergence rate of uncoupled online learning dynamics.

翻译：在本文中, 我们提供了一个新颖和简单的算法, 即 Clairvoyant 倍增 Weights 更新( CMWU), 用于在普通游戏中最遗憾的最小化。 CMWU 有效地对应了标准的 MWU 算法, 但是当所有代理商在更新其混合战略时, 使用基于明天行为的补偿配置, 即代理商是 Clirvoyant 。 CMWU 在所有正常的游戏中, 以 m 动作和固定的分流大小 $\ 美元来提供新的算法。虽然 CMWU 在其定义中编码了一个固定点计算, 原则上可以导致既不具有计算效率, 也不是没有拼凑的动态。我们显示, 具体地说, 只要进步规模的 $\ 美元= mliver\ ( n1) / v) 美元, 美元是支付范围美元。那么 CMWU 更新可以通过收缩的地图进行不线性计算。这个执行结果是, 以美元 RO 的汇率更新。