Counterfactual Regret Minimization (CFR) is the most successful algorithm for finding approximate Nash equilibria in imperfect information games. However, CFR's reliance on full game-tree traversals limits its scalability. For this reason, the game's state- and action-space is often abstracted (i.e. simplified) for CFR, and the resulting strategy is then translated back to the full game, which requires extensive expert-knowledge and often converges to highly exploitable policies. A recently proposed method, Deep CFR, applies deep learning directly to CFR, allowing the agent to intrinsically abstract and generalize over the state-space from samples, without requiring expert knowledge. In this paper, we introduce Single Deep CFR (SD-CFR), a simplified variant of Deep CFR that has a lower overall approximation error by avoiding the training of an average strategy network. We show that SD-CFR is more attractive from a theoretical perspective and empirically outperforms Deep CFR in head-to-head matches of a large poker game.
翻译:反事实悔最小化(CFR)是在不完善的信息游戏中找到近似 Nash 平衡的最为成功的算法。 然而, CFR 依赖全博树跨行限制了它的可缩放性。 因此,游戏的状态和动作空间往往为 CFR 抽象化( 即简化 ), 由此产生的战略随后又被转化回到整个游戏, 这需要广泛的专家知识, 并常常与高度可开发的政策相融合。 最近提出的一个方法, 深CFR 直接将深层次的学习应用到 CFR, 让代理器从样本中内在地抽象地对状态空间进行概括化, 不需要专家知识。 在本文中, 我们引入了“ 单一深CFR( SD- CFR) ” ( SD- CFR) 这一简化的变体, 避免对普通战略网络的培训, 其总体近似差更低。 我们表明, SD-CFRFR 从理论角度来说更具吸引力, 并且从经验上超越了大牌游戏头对头对决的深度 CFRR。