Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.
翻译:最近,通过培训物理模拟政策,并随后将其转移到现实世界(即向现实转移),深加强化学习成为在多种地形上进行脱腿运动的可行替代方法。尽管取得了相当大的进展,但传统神经网络的能力和伸缩性仍然有限,这可能会妨碍其在更为复杂的环境中的应用。相比之下,变压器结构在一系列大型序列模型任务中表现出其优越性,包括自然语言处理和决策问题。在本文中,我们提议Terrain变换器(TERT),这是在不同地形上进行四级移动控制的高容量变压器模型。此外,为了在模拟到现实情景中更好地利用变压器,我们提出了一个新型的两阶段培训框架,其中包括一个离线前培训阶段和一个在线修正阶段,这可以自然地将变压器与特权培训相结合。在模拟中进行的广泛实验表明,在返回、能源消耗和控制平稳方面,TERT在不同的地形上都符合最新水平的基线。此外,在更深入的地貌世界验证中,TERSTERT可不成功地平坦的地形,包括不成功完成的沙地,可以挑战9号的地面。