Although recurrent neural networks (RNNs) for reinforcement learning (RL) have addressed unique advantages in various aspects, e. g., solving memory-dependent tasks and meta-learning, very few studies have demonstrated how RNNs can solve the problem of hierarchical RL by autonomously developing hierarchical control. In this paper, we propose a novel model-free RL framework called ReMASTER, which combines an off-policy actor-critic algorithm with a multiple timescale stochastic recurrent neural network for solving memory-dependent and hierarchical tasks. We performed experiments using a challenging continuous control task and showed that: (1) Internal representation necessary for achieving hierarchical control autonomously develops through exploratory learning. (2) Stochastic neurons in RNNs enable faster relearning when adapting to a new task which is a recomposition of sub-goals previously learned.
翻译:虽然用于强化学习的经常性神经网络(RNNs)在解决记忆依赖性任务和元学习等问题等多个方面解决了独特的优势,但很少有研究显示,区域NNs如何通过自主发展等级控制来解决等级性RL问题。在本文中,我们提议了一个名为ReMASTER的新颖的无模式RL框架,它将一个脱离政策的行动者-批评算法与一个多时间尺度的经常性神经网络结合起来,以解决记忆依赖性和等级性任务。我们利用一项具有挑战性的连续控制任务进行了实验,并表明:(1) 通过探索性学习实现等级控制所需的内部代表性,通过探索性学习自主发展。 (2) 区域NNMs中的神经元在适应一项新任务时能够更快地进行再学习,而新任务是以前所学到的次级目标的再组合。