Deep reinforcement learning (DRL) has achieved super-human performance on complex video games (e.g., StarCraft II and Dota II). However, current DRL systems still suffer from challenges of multi-agent coordination, sparse rewards, stochastic environments, etc. In seeking to address these challenges, we employ a football video game, e.g., Google Research Football (GRF), as our testbed and develop an end-to-end learning-based AI system (denoted as TiKick) to complete this challenging task. In this work, we first generated a large replay dataset from the self-playing of single-agent experts, which are obtained from league training. We then developed a distributed learning system and new offline algorithms to learn a powerful multi-agent AI from the fixed single-agent dataset. To the best of our knowledge, Tikick is the first learning-based AI system that can take over the multi-agent Google Research Football full game, while previous work could either control a single agent or experiment on toy academic scenarios. Extensive experiments further show that our pre-trained model can accelerate the training process of the modern multi-agent algorithm and our method achieves state-of-the-art performances on various academic scenarios.
翻译:深入强化学习(DRL)在复杂的电子游戏(如StarCraft II和Dota II)上取得了超人性化的超人性能。然而,目前的DRL系统仍面临多试剂协调、微弱奖励、随机环境等挑战。为了应对这些挑战,我们采用了足球游戏,例如谷歌研究足球,作为我们的测试台,并开发了一个端到端的以学习为基础的AI系统(称为TiKick),以完成这项具有挑战性的任务。在这项工作中,我们首先从单试剂专家的自演中生成了一个大型重播数据集,这些数据来自联盟培训。然后我们开发了一个分布式学习系统和新的离线算法,以便从固定的单一试剂数据集中学习强大的多试剂AI。据我们所知,Tikick是第一个可以取代多试剂谷歌研究足球全局游戏的基于学习的AI系统,而此前的工作要么可以控制一个单一的代理器,要么可以试验玩具的学术情景。广泛的实验进一步显示,我们培训前的模型可以加快我们各种现代试算式的学习模式。