Tasks involving locally unstable or discontinuous dynamics (such as bifurcations and collisions) remain challenging in robotics, because small variations in the environment can have a significant impact on task outcomes. For such tasks, learning a robust deterministic policy is difficult. We focus on structuring exploration with multiple stochastic policies based on a mixture of experts (MoE) policy representation that can be efficiently adapted. The MoE policy is composed of stochastic sub-policies that allow exploration of multiple distinct regions of the action space (or strategies) and a high-level selection policy to guide exploration towards the most promising regions. We develop a robot system to evaluate our approach in a real-world physical problem solving domain. After training the MoE policy in simulation, online learning in the real world demonstrates efficient adaptation within just a few dozen attempts, with a minimal sim2real gap. Our results confirm that representing multiple strategies promotes efficient adaptation in new environments and strategies learned under different dynamics can still provide useful information about where to look for good strategies.
翻译:涉及当地不稳定或不连续动态(如两条线和碰撞)的任务在机器人方面仍然具有挑战性,因为环境的细小变化可能对任务结果产生重大影响。对于这些任务,我们很难学习强有力的确定性政策。我们注重在专家混合(MoE)政策代表制的基础上,以多种随机政策来安排探索,这种政策可以有效地加以调整。教育部的政策由随机分政策组成,允许探索行动空间(或战略)的多个不同区域,以及指导探索最有希望区域的高级别选择政策。我们开发了一个机器人系统,在现实世界的物理问题解决领域评估我们的方法。在对MOE政策进行模拟培训之后,在现实世界的网上学习显示,只要经过数十次尝试,即可进行有效的适应,并有最低限度的微小的微微分界差距。我们的结果证实,代表多种战略可以促进在新环境中进行有效的适应,在不同动态下学习的战略,仍然能提供有用的信息,说明如何寻找好的战略。</s>