We consider the problem of synthesizing multi-action human motion sequences of arbitrary lengths. Existing approaches have mastered motion sequence generation in single-action scenarios, but fail to generalize to multi-action and arbitrary-length sequences. We fill this gap by proposing a novel efficient approach that leverages the expressiveness of Recurrent Transformers and generative richness of conditional Variational Autoencoders. The proposed iterative approach is able to generate smooth and realistic human motion sequences with an arbitrary number of actions and frames while doing so in linear space and time. We train and evaluate the proposed approach on PROX dataset which we augment with ground-truth action labels. Experimental evaluation shows significant improvements in FID score and semantic consistency metrics compared to the state-of-the-art.
翻译:我们考虑的是将任意长度的多动作人类运动序列综合在一起的问题。现有的方法已经在单一行动情景中掌握了运动序列生成,但未能推广到多动作和任意的序列。我们通过提出一种新的高效方法来填补这一空白,利用经常变换器的清晰度和有条件变异自动转换器的基因丰富度。提议的迭代方法能够产生平稳和现实的人类运动序列,同时在线性空间和时间上任意采取数量的行动和框架。我们培训和评价关于PROX数据集的拟议方法,我们用地面真实行动标签加以补充。实验性评估显示,与最新技术相比,FID分数和语义一致性指标有了显著改进。