通过竞争进行新兴协调 (Emergent Coordination Through Competition)

We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics. We demonstrate that decentralized, population-based training with co-play can lead to a progression in agents' behaviors: from random, to simple ball chasing, and finally showing evidence of cooperation. Our study highlights several of the challenges encountered in large scale multi-agent training in continuous control. In particular, we demonstrate that the automatic optimization of simple shaping rewards, not themselves conducive to co-operative behavior, can lead to long-horizon team behavior. We further apply an evaluation scheme, grounded by game theoretic principals, that can assess agent performance in the absence of pre-defined evaluation tasks or human baselines.

翻译：我们通过引入具有连续模拟物理学的具有挑战性的多试剂足球环境,研究在强化学习剂中出现合作行为的问题。我们证明,分散化的、以人口为基础的、带有共同玩耍的培训可以导致代理人行为的发展:从随机的到简单的球追逐,最后显示合作的证据。我们的研究强调了在连续控制方面大规模多试剂培训中遇到的一些挑战。特别是,我们证明,自动优化简单的塑造奖赏本身不利于合作行为,可能导致长期的团队行为。我们进一步采用了一种以游戏理论原则为基础的评估计划,该计划可以在没有预先确定的评价任务或人类基线的情况下评估代理人的表现。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

专知会员服务

16+阅读 · 2019年11月22日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日