This paper considers repeated games in which one player has more information about the game than the other players. In particular, we investigate repeated two-player zero-sum games where only the column player knows the payoff matrix A of the game. Suppose that while repeatedly playing this game, the row player chooses her strategy at each round by using a no-regret algorithm to minimize her (pseudo) regret. We develop a no-instant-regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium. We show that our algorithm is efficient against a large set of popular no-regret algorithms of the row player, including the multiplicative weight update algorithm, the online mirror descent method/follow-the-regularized-leader, the linear multiplicative weight update algorithm, and the optimistic multiplicative weight update.
翻译:本文考虑了重复游戏, 其中一位玩家比其他玩家对游戏有更多的信息。 特别是, 我们调查了重复的双玩家零和游戏, 只有列玩家知道游戏的回报矩阵 A。 假设在反复玩这个游戏时, 行玩家在每回合选择策略, 使用无回报算法来尽量减少她的遗憾( 假想 ) 。 我们为列玩家开发了一个无即时- regret 算法, 以展示最后一轮接近微模马克斯平衡的情况。 我们显示我们的算法对行玩家的一大批流行的无回报算法是有效的, 包括倍增重更新算法、 在线镜下沉法/ 后续常规领导法、 线性倍增重算法更新算法, 以及乐观的倍增重更新法 。