深强化学习超额应用研究 (A Study on Overfitting in Deep Reinforcement Learning)

Recent years have witnessed significant progresses in deep Reinforcement Learning (RL). Empowered with large scale neural networks, carefully designed architectures, novel training algorithms and massively parallel computing devices, researchers are able to attack many challenging RL problems. However, in machine learning, more training power comes with a potential risk of more overfitting. As deep RL techniques are being applied to critical problems such as healthcare and finance, it is important to understand the generalization behaviors of the trained agents. In this paper, we conduct a systematic study of standard RL agents and find that they could overfit in various ways. Moreover, overfitting could happen "robustly": commonly used techniques in RL that add stochasticity do not necessarily prevent or detect overfitting. In particular, the same agents and learning algorithms could have drastically different test performance, even when all of them achieve optimal rewards during training. The observations call for more principled and careful evaluation protocols in RL. We conclude with a general discussion on overfitting in RL and a study of the generalization behaviors from the perspective of inductive bias.

翻译：近些年来,在深入强化学习(RL)方面取得了显著进展。借助大型神经网络、精心设计的建筑、新培训算法和大量平行的计算装置,研究人员能够应对许多挑战性RL的问题。然而,在机器学习中,更多的培训能力带来更大的超容风险。随着深入的RL技术被应用于医疗保健和金融等关键问题,理解受过培训的代理人员的一般行为非常重要。在本文中,我们对标准RL代理物进行了系统的研究,发现它们可以以各种方式过度适用。此外,过度适用可能发生“粗糙”的情况:在RL中常用的技术,增加随机性并不一定能来防止或检测过度匹配。特别是,同样的代理物和学习算法可能会产生截然不同的测试性能,即使它们在培训期间都获得了最佳的回报。观察要求在RL中制定更加有原则性和仔细的评估程序。我们最后是就过度适用RL代理物剂问题进行一般性讨论,并从感性偏向偏见的角度研究一般化行为的研究。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日