Synthesizing graceful and life-like behaviors for physically simulated characters has been a fundamental challenge in computer animation. Data-driven methods that leverage motion tracking are a prominent class of techniques for producing high fidelity motions for a wide range of behaviors. However, the effectiveness of these tracking-based methods often hinges on carefully designed objective functions, and when applied to large and diverse motion datasets, these methods require significant additional machinery to select the appropriate motion for the character to track in a given scenario. In this work, we propose to obviate the need to manually design imitation objectives and mechanisms for motion selection by utilizing a fully automated approach based on adversarial imitation learning. High-level task objectives that the character should perform can be specified by relatively simple reward functions, while the low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips, without any explicit clip selection or sequencing. These motion clips are used to train an adversarial motion prior, which specifies style-rewards for training the character through reinforcement learning (RL). The adversarial RL procedure automatically selects which motion to perform, dynamically interpolating and generalizing from the dataset. Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips. Composition of disparate skills emerges automatically from the motion prior, without requiring a high-level motion planner or other task-specific annotations of the motion clips. We demonstrate the effectiveness of our framework on a diverse cast of complex simulated characters and a challenging suite of motor control tasks.
翻译:在计算机动画中,一个根本性的挑战就是使物理模拟字符的优美和生命般的行为同步化,这是计算机动画中的一项根本性挑战。数据驱动方法利用运动跟踪工具,是产生对多种行为高度忠诚运动的一种突出的技术。然而,这些基于跟踪方法的效力往往取决于精心设计的客观功能,当这些方法应用于大型和多种运动数据集时,这些方法需要大量额外机制来选择在特定情景中跟踪该特性的适当动作。在这项工作中,我们提议避免需要使用完全自动化的方法来手动设计模拟目标和运动选择机制,即使用完全自动化的方法,以模拟模拟学习为基础来进行运动选择。该特性应表现的高级任务目标可以通过相对简单的奖励功能加以规定,而该特性表现的低等级则取决于非结构化运动剪动数据集,而没有明确的剪动选择或排序。这些运动剪动程序用于在通过强化学习(RL)对性能培训性能和动作选择非结构化模拟模型。 对抗性RL程序自动地选择该特性应当表现的高度运动,而这种动作应当由相对简单的机动性动作来进行,同时通过具有动态性地跟踪我们高层次数据结构的动作,这些结构的动作,并且通过具有动态的高级的动作,同时进行更精确的动作的动作的动作,这些动作的动作的动作的动作,这些动作的动作的动作也可以性动作的动作的动作的动作性动作的动作的动作的动作的动作的动作的动作的动作的动作,也显示。