Model-based reinforcement learning (RL) has proven to be a data efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images. In this paper, we present a method for learning representations that are suitable for iterative model-based policy improvement, in that these representations are optimized for inferring simple dynamics and cost models given data from the current policy. This enables a model-based RL method based on the linear-quadratic regulator (LQR) to be used for systems with image observations. We evaluate our approach on a suite of robotics tasks, including manipulation tasks on a real Sawyer robot arm directly from images, and we find that our method results in better final performance than other model-based RL methods while being significantly more efficient than model-free RL. Videos of our results are available at https://sites.google.com/view/icml19solar
翻译:事实证明,基于模型的强化学习(RL)是学习控制任务的一种数据高效方法,但难以在图像等复杂观测领域加以利用。在本文中,我们提出了一个适合迭代基于模型的政策改进的学习表现方法,因为这些表现方式最优化,可以推断从现行政策中得出的简单动态和成本模型。这可以使基于线性赤道调节器(LQR)的基于模型的RL方法用于有图像观测的系统。我们评估了我们关于一套机器人任务的方法,包括直接从图像中直接操作真正的索伊尔机器人臂,我们发现,我们的方法比其他基于模型的RL方法取得更好的最后性能,同时比无模型RL效率要高得多。我们结果的视频可在https://sites.gogle.com/view/icml19solar查阅https://sites.gogle.com/view/icml19solar