Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.
翻译:自一年前引入强化学习的分布式方法(分布式RL)以来,与模型预期值的标准方法(预期的RL)相比,已经产生了强有力的结果。然而,除了趋同保证之外,对于改进分布式RL所提供的理由,几乎没有多少理论结果。在本文件中,我们通过分析表单、线性近似和非线性近似设置的差异,开始调查这一根本问题。我们证明,在许多实现表格和线性近似设置的过程中,分布式RL的表现与预期的RL完全相同。在两种方法表现不同的情况下,分配式RL在不引起相同行为时实际上会损害性能。然后,我们用非线性近似方法对控制环境中的分布式和预期的RL方法进行比较,在与分布式RL方法相比的改进正在产生的地方,我们继续进行经验分析。