具有现实数据来源的实实在在的世界离线强化学习 (Real World Offline Reinforcement Learning with Realistic Data Source)

Offline reinforcement learning (ORL) holds great promise for robot learning due to its ability to learn from arbitrary pre-generated experience. However, current ORL benchmarks are almost entirely in simulation and utilize contrived datasets like replay buffers of online RL agents or sub-optimal trajectories, and thus hold limited relevance for real-world robotics. In this work (Real-ORL), we posit that data collected from safe operations of closely related tasks are more practical data sources for real-world robot learning. Under these settings, we perform an extensive (6500+ trajectories collected over 800+ robot hours and 270+ human labor hour) empirical study evaluating generalization and transfer capabilities of representative ORL methods on four real-world tabletop manipulation tasks. Our study finds that ORL and imitation learning prefer different action spaces, and that ORL algorithms can generalize from leveraging offline heterogeneous data sources and outperform imitation learning. We release our dataset and implementations at URL: https://sites.google.com/view/real-orl

翻译：离线强化学习(ORL)对于机器人学习有很大的希望,因为它能够从任意的预生经验中学习。然而,目前的ORL基准几乎完全用于模拟,并使用像在线 RL 代理器或亚最佳轨迹的回放缓冲,从而对真实世界机器人具有有限的相关性。在这项工作中(Real-ORL),我们认为,从与密切相关的任务的安全操作中收集的数据是更实用的数据来源,用于真实世界机器人学习。在这些设置下,我们开展了一个广泛的(6500+轨迹,收集了800多个机器人小时以上,270+人类劳动小时以上)实证研究,评估了在四种真实世界桌面操作任务中代表ORL方法的一般化和转移能力。我们的研究发现,ORL和模拟学习倾向于不同的行动空间,而ORL算法可以通过利用离线外的多式数据源和外形模拟学习来概括。我们在 URL 上公布我们的数据设置和实施: https://sites.gogle.com/view/real-orl。