Deep reinforcement learning (DRL) has gained a lot of attention in recent years, and has been proven to be able to play Atari games and Go at or above human levels. However, those games are assumed to have a small fixed number of actions and could be trained with a simple CNN network. In this paper, we study a special class of Asian popular card games called Dou Di Zhu, in which two adversarial groups of agents must consider numerous card combinations at each time step, leading to huge number of actions. We propose a novel method to handle combinatorial actions, which we call combinational Q-learning (CQL). We employ a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions. Results show that our method prevails over state-of-the art methods like naive Q-learning and A3C. We develop an easy-to-use card game environments and train all agents adversarially from sractch, with only knowledge of game rules and verify that our agents are comparative to humans. Our code to reproduce all reported results will be available online.
翻译:近年来,深入强化学习(DRL)受到了很多关注,并被证明能够玩Atari游戏,在人的水平上或之上进行。然而,这些游戏假定有少量固定的行动,可以使用简单的CNN网络来进行培训。在本文中,我们研究了一个特殊类亚洲流行的纸牌游戏,名为Dou Di Zhu, 其中两个对立的代理群体必须在每一步都考虑许许多多的纸牌组合,从而导致大量行动。我们提出了一个处理组合式行动的新方法,我们称之为组合式QL(CQL)。我们使用一个两阶段网络来减少行动空间,并运用不规则性最大集合操作来获取原始行动之间的关系。结果显示,我们的方法胜过天真的Q-学习和A3C等最新艺术方法。我们开发一个容易使用的纸牌环境,从草场中培训所有对立的代理,只有游戏规则知识,并核实我们的代理与人类进行比较。我们所报告结果的代码将在网上公布。