We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in control and robot dynamics, which often leads to out-of-distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment-specific controller at inference time. By integrating gradient feedback from the controller's tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse-and even highly constrained-embodiments. All code, data, and checkpoints will be publicly released after acceptance. Result videos can be found at umi-on-air.github.io.
翻译:本文提出UMI-on-Air,一种用于载体感知部署载体无关操作策略的框架。该方法利用手持夹爪(UMI)收集的多样化、无约束人类演示数据,训练可泛化的视觉运动策略。将这些策略迁移至受限机器人载体(如空中机械臂)的核心挑战在于控制与机器人动力学的失配,这常导致分布外行为与执行效果不佳。为此,我们提出载体感知扩散策略(EADP),在推理阶段将高层UMI策略与低层载体专用控制器耦合。通过将控制器跟踪成本的梯度反馈融入扩散采样过程,该方法引导轨迹生成朝向针对部署载体定制的动态可行模式,实现测试阶段的即插即用式载体感知轨迹适配。我们在多组长时程高精度空中操作任务中验证了该方法的有效性,相较于无引导扩散基线,其成功率、效率及抗干扰鲁棒性均显著提升。最后,我们展示了在未见环境中利用野外采集的UMI演示数据进行部署的案例,为跨多样化(甚至高度受限)载体扩展可泛化操作技能提供了实用路径。所有代码、数据与模型检查点将在论文录用后公开。演示视频可访问umi-on-air.github.io获取。