We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be decomposed as the combination of some statistical estimator and a method for imputing a return distribution consistent with that set of statistics. With this new understanding, we are able to provide improved analyses of existing DRL algorithms as well as construct a new algorithm (EDRL) based upon estimation of the expectiles of the return distribution. We compare EDRL with existing methods on a variety of MDPs to illustrate concrete aspects of our analysis, and develop a deep RL variant of the algorithm, ER-DQN, which we evaluate on the Atari-57 suite of games.
翻译:我们提出了设计和分析分配强化学习(DRL)算法的统一框架,用于对返回分布的统计进行递归性估计。我们的主要见解是,DRL算法可以作为某些统计估计器和与该数据集相一致的计算返回分布方法的结合而分解。有了这种新的理解,我们能够对现有的DRL算法进行更好的分析,并根据对返回分布的预期值的估计建立一个新的算法(EDRL )。我们将EDRL与各种 MDP 的现有方法进行比较,以说明我们分析的具体方面,并开发出一种与该数据集相匹配的深度RL变体,即ER-DQN,我们用来评估Atari-57套游戏。