Learning long-horizon tasks such as navigation has presented difficult challenges for successfully applying reinforcement learning to robotics. From another perspective, under known environments, sampling-based planning can robustly find collision-free paths in environments without learning. In this work, we propose Control Transformer that models return-conditioned sequences from low-level policies guided by a sampling-based Probabilistic Roadmap (PRM) planner. We demonstrate that our framework can solve long-horizon navigation tasks using only local information. We evaluate our approach on partially-observed maze navigation with MuJoCo robots, including Ant, Point, and Humanoid. We show that Control Transformer can successfully navigate through mazes and transfer to unknown environments. Additionally, we apply our method to a differential drive robot (Turtlebot3) and show zero-shot sim2real transfer under noisy observations.
翻译:从另一个角度看,在已知环境中,抽样规划可以在不学习的情况下,在环境中有力地找到无碰撞路径。在这项工作中,我们提议控制变异器,在基于取样的概率性路线图(PRM)规划器的指导下,从低层政策中模拟有回归条件的序列。我们证明我们的框架只能使用当地信息才能解决长程导航任务。我们评估了我们与MuJoCo机器人(包括Ant、Point和humano)一起进行部分观测的迷宫导航的方法。我们显示控制变异器可以通过迷宫成功导航和转移到未知环境。此外,我们将我们的方法应用到一个有差异的驱动机器人(Turtlebot3),并显示在噪音的观测下零射速的Sim2真实传输。</s>