Model-based methods have recently shown promising for offline reinforcement learning (RL), aiming to learn good policies from historical data without interacting with the environment. Previous model-based offline RL methods learn fully connected nets as world-models that map the states and actions to the next-step states. However, it is sensible that a world-model should adhere to the underlying causal effect such that it will support learning an effective policy generalizing well in unseen states. In this paper, We first provide theoretical results that causal world-models can outperform plain world-models for offline RL by incorporating the causal structure into the generalization error bound. We then propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structure (FOCUS), to illustrate the feasibility of learning and leveraging causal structure in offline RL. Experimental results on two benchmarks show that FOCUS reconstructs the underlying causal structure accurately and robustly. Consequently, it performs better than the plain model-based offline RL algorithms and other causal model-based RL algorithms.
翻译:以模型为基础的方法最近为离线强化学习(RL)展示了希望,目的是在不与环境互动的情况下从历史数据中学习好的政策,以往以模型为基础的离线RL方法将完全连接的网作为世界模型学习,将状态和行动映射到下一步骤状态。然而,一个世界模型应当坚持根本的因果关系效应,从而支持在不为人知的状态中学习一个有效的概括性政策。在本文件中,我们首先提供了理论结果,即因果世界模型可以通过将因果结构纳入一般化错误而优于离线的普通世界模型。我们然后提出一个实用算法,OFline mOdel基于卡乌萨尔结构的强化学习,以说明学习和利用离线RL的因果结构的可行性。两个基准的实验结果显示,FOCUS准确和有力地重建了基本因果结构。因此,其表现优于基于普通模型的离线RL算法和其他基于因果模型的RL算法。