Training deep reinforcement learning (DRL) locomotion policies often require massive amounts of data to converge to the desired behaviour. In this regard, simulators provide a cheap and abundant source. For successful sim-to-real transfer, exhaustively engineered approaches such as system identification, dynamics randomization, and domain adaptation are generally employed. As an alternative, we investigate a simple strategy of random force injection (RFI) to perturb system dynamics during training. We show that the application of random forces enables us to emulate dynamics randomization. This allows us to obtain locomotion policies that are robust to variations in system dynamics. We further extend RFI, referred to as extended random force injection (ERFI), by introducing an episodic actuation offset. We demonstrate that ERFI provides additional robustness for variations in system mass offering on average a 53% improved performance over RFI. We also show that ERFI is sufficient to perform a successful sim-to-real transfer on two different quadrupedal platforms, ANYmal C and Unitree A1, even for perceptive locomotion over uneven terrain in outdoor environments.
翻译:深强化学习(DRL)运动政策通常需要大量的数据才能与理想的行为相融合。 在这方面,模拟器提供了廉价和丰富的来源。 为了成功进行模拟到实际转移,通常采用系统识别、动态随机化和域适应等详尽设计的方法。作为一种替代办法,我们调查了一种简单的随机武力注射战略,以在培训期间干扰系统动态。我们显示随机力量的应用使我们能够模仿动态随机化。这使我们能够获得对系统动态变化具有强大影响的移动政策。我们进一步扩展了被称为扩展随机注入的RFI, 采用了一种随机作用抵消。我们证明ERFI为系统规模变化提供了额外的强力性能,比RFI平均提高了53%的性能。我们还表明ERFI足以在两个不同的四分化平台,即Engmal C和Unitre A1上成功地进行模拟到真实的转移,甚至为室内不均匀地形的感知性移动。</s>