Reinforcement learning (RL) approaches based on Markov Decision Processes (MDPs) are predominantly applied in the robot joint space, often relying on limited task-specific information and partial awareness of the 3D environment. In contrast, episodic RL has demonstrated advantages over traditional MDP-based methods in terms of trajectory consistency, task awareness, and overall performance in complex robotic tasks. Moreover, traditional step-wise and episodic RL methods often neglect the contact-rich information inherent in task-space manipulation, especially considering the contact-safety and robustness. In this work, contact-rich manipulation tasks are tackled using a task-space, energy-safe framework, where reliable and safe task-space trajectories are generated through the combination of Proximal Policy Optimization (PPO) and movement primitives. Furthermore, an energy-aware Cartesian Impedance Controller objective is incorporated within the proposed framework to ensure safe interactions between the robot and the environment. Our experimental results demonstrate that the proposed framework outperforms existing methods in handling tasks on various types of surfaces in 3D environments, achieving high success rates as well as smooth trajectories and energy-safe interactions.
翻译:基于马尔可夫决策过程(MDP)的强化学习方法主要应用于机器人关节空间,通常依赖有限的任务特定信息和对三维环境的局部感知。相比之下,片段式强化学习在复杂机器人任务中展现出优于传统MDP方法的轨迹一致性、任务感知能力和整体性能。此外,传统的步进式与片段式强化学习方法常忽略任务空间操作中固有的密集接触信息,尤其是接触安全性与鲁棒性考量。本研究采用任务空间能量安全框架处理密集接触操作任务,通过近端策略优化(PPO)与运动基元的结合生成可靠且安全的任务空间轨迹。进一步地,框架中引入了能量感知笛卡尔阻抗控制器目标,以确保机器人与环境间的安全交互。实验结果表明,所提框架在三维环境中各类表面任务处理上优于现有方法,实现了高成功率、平滑轨迹及能量安全交互。