In this paper we study how transforming regular reinforcement learning environments into goal-conditioned environments can let agents learn to solve tasks autonomously and reward-free. We show that an agent can learn to solve tasks by selecting its own goals in an environment-agnostic way, at training times comparable to externally guided reinforcement learning. Our method is independent of the underlying off-policy learning algorithm. Since our method is environment-agnostic, the agent does not value any goals higher than others, leading to instability in performance for individual goals. However, in our experiments, we show that the average goal success rate improves and stabilizes. An agent trained with this method can be instructed to seek any observations made in the environment, enabling generic training of agents prior to specific use cases.
翻译:本文研究了如何将常规强化学习环境转化为目标条件化环境,使智能体能够在无奖励条件下自主学习解决任务。我们证明,智能体可以通过以环境无关的方式自主选择目标来学习任务,其训练时间与外部引导的强化学习相当。我们的方法独立于底层的离策略学习算法。由于该方法具有环境无关性,智能体不会对任何目标赋予更高价值,这导致单个目标的性能表现存在不稳定性。然而,实验结果表明,平均目标成功率会提升并趋于稳定。通过该方法训练的智能体能够被引导至环境中任何可观测状态,从而在特定应用场景前实现智能体的通用化训练。