MS-PPO：用于腿式机器人运动控制的形态对称等变策略 (MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion)

Reinforcement learning has recently enabled impressive locomotion capabilities on legged robots; however, most policy architectures remain morphology- and symmetry-agnostic, leading to inefficient training and limited generalization. This work introduces MS-PPO, a morphological-symmetry-equivariant policy learning framework that encodes robot kinematic structure and morphological symmetries directly into the policy network. We construct a morphology-informed graph neural architecture that is provably equivariant with respect to the robot's morphological symmetry group actions, ensuring consistent policy responses under symmetric states while maintaining invariance in value estimation. This design eliminates the need for tedious reward shaping or costly data augmentation, which are typically required to enforce symmetry. We evaluate MS-PPO in simulation on Unitree Go2 and Xiaomi CyberDog2 robots across diverse locomotion tasks, including trotting, pronking, slope walking, and bipedal turning, and further deploy the learned policies on hardware. Extensive experiments show that MS-PPO achieves superior training stability, symmetry generalization ability, and sample efficiency in challenging locomotion tasks, compared to state-of-the-art baselines. These findings demonstrate that embedding both kinematic structure and morphological symmetry into policy learning provides a powerful inductive bias for legged robot locomotion control. Our code will be made publicly available at https://lunarlab-gatech.github.io/MS-PPO/.

翻译：强化学习近期在腿式机器人运动控制方面取得了显著进展；然而，大多数策略架构仍与形态和对称性无关，导致训练效率低下且泛化能力有限。本研究提出MS-PPO，一种形态对称等变策略学习框架，将机器人运动学结构及形态对称性直接编码至策略网络中。我们构建了一种形态感知的图神经网络架构，该架构在数学上被证明对机器人形态对称群作用具有等变性，确保对称状态下策略响应的一致性，同时保持价值估计的不变性。此设计无需依赖繁琐的奖励塑形或昂贵的数据增强——这些通常是强制对称性所需的常规手段。我们在仿真环境中基于Unitree Go2和小米CyberDog2机器人评估MS-PPO，涵盖多种运动任务（包括小跑、纵跳、斜坡行走和双足转向），并将学习到的策略进一步部署至实体硬件。大量实验表明，相较于现有先进基线方法，MS-PPO在挑战性运动任务中展现出更优的训练稳定性、对称性泛化能力和样本效率。这些发现证明，将运动学结构与形态对称性共同嵌入策略学习，能为腿式机器人运动控制提供强大的归纳偏置。我们的代码将公开于https://lunarlab-gatech.github.io/MS-PPO/。

相关内容

关注 0

多媒体系统（MS）期刊详细介绍了多媒体计算，通信，存储和应用的各个方面的创新研究思想，新兴技术，最新方法和工具。它包含理论，实验和调查文章。多媒体系统的覆盖范围包括：在计算机系统中集成数字视频和音频功能；多媒体信息编码和数据交换格式；数字多媒体的操作系统机制；数字视频和音频网络与通信；存储模型和结构；用于支持多媒体应用程序的方法、范式、工具和软件体系结构；多媒体应用程序和应用程序接口，以及多媒体终端系统架构。官网地址：http://dblp.uni-trier.de/db/journals/mms/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日