We focus on the problem of learning a single motor module that can flexibly express a range of behaviors for the control of high-dimensional physically simulated humanoids. To do this, we propose a motor architecture that has the general structure of an inverse model with a latent-variable bottleneck. We show that it is possible to train this model entirely offline to compress thousands of expert policies and learn a motor primitive embedding space. The trained neural probabilistic motor primitive system can perform one-shot imitation of whole-body humanoid behaviors, robustly mimicking unseen trajectories. Additionally, we demonstrate that it is also straightforward to train controllers to reuse the learned motor primitive space to solve tasks, and the resulting movements are relatively naturalistic. To support the training of our model, we compare two approaches for offline policy cloning, including an experience efficient method which we call linear feedback policy cloning. We encourage readers to view a supplementary video ( https://youtu.be/CaDEf-QcKwA ) summarizing our results.
翻译:我们的重点是学习一个能够灵活表达一系列行为以控制高维物理模拟类人体的单一运动模块的问题。 为此,我们提出一个具有一个具有潜在可变瓶颈的反型模型总体结构的发动机结构。 我们表明,可以完全从网上训练这一模型以压缩数千项专家政策并学习一个运动原始嵌入空间。经过训练的神经概率运动原始系统可以对全体人体行为进行一拍模仿,强力模仿看不见的轨迹。此外,我们还表明,训练控制器再利用所学的发动机原始空间解决任务也是直截了当的。为了支持对模型的培训,我们比较了两种离线政策克隆的方法,包括我们称之为线性反馈政策克隆的经验高效方法。我们鼓励读者查看一个补充视频(https://yotu.be/CaDEf-QKwA),以总结我们的结果。