模拟不限名额运动会学习行为的多样性 (Modelling Behavioural Diversity for Learning in Open-Ended Games)

Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on \emph{determinantal point processes} (DPP). By incorporating the diversity metric into best-response dynamics, we develop \emph{diverse fictitious play} and \emph{diverse policy-space response oracle} for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the \emph{gamescape} -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve much lower exploitability than state-of-the-art solvers by finding effective and diverse strategies.

翻译：在战略周期存在的情况下,促进行为多样性对于解决具有非短暂动态的游戏至关重要,而且没有一贯的赢家(如摇滚-纸-剪刀)。然而,在定义多样性和构建多样性认知学习动态方面缺乏严格的对待。在这项工作中,我们对游戏中的行为多样性进行几何解释,并引入基于\emph{determinantal point}(DPP)的新型多样性指标。通过将多样性指标纳入最佳反应动态,我们开发了用于解决正常形式游戏和开放式游戏的双向政策-空间响应或触角。我们证明了多样性最佳反应的独特性和我们两种玩游戏的算法的趋同性。重要的是,我们展示了基于DPP的多样化指标保障的最大化,以扩大mph{gameforpost} -- -- 代理方战略混合物所覆盖的矩形组合。为了验证我们的多样性认知解答器,我们测试了显示强非透明性强的数种游戏和不透明性。结果表明,我们采用比州性更低的策略,我们的方法可以实现更低的多样化。