无人监督的可转让操纵技能培训强化学习 (Unsupervised Reinforcement Learning for Transferable Manipulation Skill Discovery)

from arxiv, 8 pages, 9 figures; accepted for publication in the IEEE Robotics and Automation Letters (RA-L); supplementary video available at https://www.youtube.com/watch?v=bF3Y4WXfM7c&t=9s

Current reinforcement learning (RL) in robotics often experiences difficulty in generalizing to new downstream tasks due to the innate task-specific training paradigm. To alleviate it, unsupervised RL, a framework that pre-trains the agent in a task-agnostic manner without access to the task-specific reward, leverages active exploration for distilling diverse experience into essential skills or reusable knowledge. For exploiting such benefits also in robotic manipulation, we propose an unsupervised method for transferable manipulation skill discovery that ties structured exploration toward interacting behavior and transferable skill learning. It not only enables the agent to learn interaction behavior, the key aspect of the robotic manipulation learning, without access to the environment reward, but also to generalize to arbitrary downstream manipulation tasks with the learned task-agnostic skills. Through comparative experiments, we show that our approach achieves the most diverse interacting behavior and significantly improves sample efficiency in downstream tasks including the extension to multi-object, multitask problems.

翻译：目前机器人的强化学习(RL)往往由于固有的特定任务培训模式而在推广到新的下游任务方面遇到困难。为了缓解这一困难,在不受监督的RL这一框架以任务不可知的方式对代理人进行先质的质询,而不能获得特定任务的报酬,利用积极探索将不同经验提炼为基本技能或可再使用的知识。为了在机器人操纵中利用这些好处,我们建议一种将可转移操纵技能发现方法,将结构上的探索与互动行为和可转移技能学习联系起来。它不仅使代理人能够学习互动行为,即机器人操纵学习的关键方面,而不能获得环境奖励,而且还能够以学习的任务不可知技能来概括任意下游操作任务。我们通过比较实验,表明我们的方法实现了最多样化的互动行为,并大大提高了下游任务的样本效率,包括扩展到多对象、多任务问题。