Training deep reinforcement learning (DRL) locomotion policies often requires massive amounts of data to converge to the desired behavior. In this regard, simulators provide a cheap and abundant source. For successful sim-to-real transfer, exhaustively engineered approaches such as system identification, dynamics randomization, and domain adaptation are generally employed. As an alternative, we investigate a simple strategy of random force injection (RFI) to perturb system dynamics during training. We show that the application of random forces enables us to emulate dynamics randomization.This allows us to obtain locomotion policies that are robust to variations in system dynamics. We further extend RFI, referred to as extended random force injection (ERFI), by introducing an episodic actuation offset. We demonstrate that ERFI provides additional robustness for variations in system mass offering on average a 61% improved performance over RFI. We also show that ERFI is sufficient to perform a successful sim-to-real transfer on two different quadrupedal platforms, ANYmal C and Unitree A1, even for perceptive locomotion over uneven terrain in outdoor environments.
翻译:深强化学习(DRL)运动政策通常需要大量的数据才能与理想的行为相融合。 在这方面, 模拟器提供了廉价和丰富的来源。 为了成功进行模拟到真实的转移, 通常会采用系统识别、 动态随机化和域适应等详尽设计的方法。 作为替代办法, 我们调查一种简单的随机武力注射策略( RFI), 以在培训期间干扰系统动态。 我们显示随机力量的应用使我们能够模仿动态随机化。 这使我们能够获得适应系统动态变化的动态化政策。 我们进一步扩展RFI, 称之为扩展随机喷射( ERFI), 其方法是引入一个分层动作抵消。 我们表明, ERFI 为系统质量变化提供了额外的稳健性, 平均比RFI提高61%的性能。 我们还表明, ERFI足以在两个不同的四分平台, 即 emmal C 和 Unicree A1 上成功进行模拟到真实的转移, 甚至是在室外环境不均匀的地形上进行感知的移动。