高效学习通用运动跟踪策略以实现高动态人形机器人全身控制 (EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control)

Learning a general motion tracking policy from human motions shows great potential for versatile humanoid whole-body control. Conventional approaches are not only inefficient in data utilization and training processes but also exhibit limited performance when tracking highly dynamic motions. To address these challenges, we propose EGM, a framework that enables efficient learning of a general motion tracking policy. EGM integrates four core designs. Firstly, we introduce a Bin-based Cross-motion Curriculum Adaptive Sampling strategy to dynamically orchestrate the sampling probabilities based on tracking error of each motion bin, eficiently balancing the training process across motions with varying dificulty and durations. The sampled data is then processed by our proposed Composite Decoupled Mixture-of-Experts (CDMoE) architecture, which efficiently enhances the ability to track motions from different distributions by grouping experts separately for upper and lower body and decoupling orthogonal experts from shared experts to separately handle dedicated features and general features. Central to our approach is a key insight we identified: for training a general motion tracking policy, data quality and diversity are paramount. Building on these designs, we develop a three-stage curriculum training flow to progressively enhance the policy's robustness against disturbances. Despite training on only 4.08 hours of data, EGM generalized robustly across 49.25 hours of test motions, outperforming baselines on both routine and highly dynamic tasks.

翻译：从人类运动中学习通用运动跟踪策略为人形机器人全身控制展现了巨大潜力。传统方法不仅在数据利用和训练过程上效率低下，而且在跟踪高动态运动时表现出性能局限。为应对这些挑战，我们提出EGM框架，能够高效学习通用运动跟踪策略。EGM融合了四项核心设计：首先，我们提出基于分箱的跨运动课程自适应采样策略，根据每个运动箱的跟踪误差动态调整采样概率，有效平衡不同难度和时长运动的训练过程；随后，采样数据通过我们提出的复合解耦专家混合架构进行处理，该架构通过为上、下半身分别分组专家，并将正交专家与共享专家解耦以分别处理专用特征与通用特征，从而高效增强跟踪不同分布运动的能力。我们方法的核心在于发现的关键洞见：训练通用运动跟踪策略时，数据质量与多样性至关重要。基于这些设计，我们开发了三阶段课程训练流程，逐步提升策略对抗扰动的鲁棒性。尽管仅使用4.08小时数据进行训练，EGM在49.25小时的测试运动中展现出强大的泛化能力，在常规任务和高动态任务上均超越基线方法。