Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However, there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Q-learning, a novel distributional RL method based on generative adversarial networks (GANs) and analyze its performance in simple tabular environments, as well as OpenAI Gym. We empirically show that our algorithm leverages the flexibility and blackbox approach of deep learning models while providing a viable alternative to traditional methods.
翻译:在复杂的Markov决策程序(MDPs)中,分配强化学习(分布性RL)在设定非线性功能近似方面取得了成功经验,然而,人们可以以多种不同的方式利用分配方法来强化学习。在本文中,我们提议采用GAN Q-learning(GAN Q-learning)这个基于基因对抗网络(GANs)的新颖的分配性RL(RL)方法,并分析其在简单表格环境中的表现,以及OpenAI Gym。 我们的经验显示,我们的算法利用了深层次学习模式的灵活性和黑盒方法,同时为传统方法提供了可行的替代方法。