从真实世界演示中预训练统一PDDL领域以实现通用机器人任务规划 (Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning)

Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the unified domain and systematically fuses them into high-quality meta-domains to support compositional generalization in planning. Experiments on diverse real-world tasks show that UniDomain solves complex, unseen tasks in a zero-shot manner, achieving up to 58% higher task success and 160% improvement in plan optimality over state-of-the-art LLM and LLM-PDDL baselines.

翻译：在真实环境中的机器人任务规划需要对语言和视觉的隐式约束进行推理。虽然大语言模型和视觉语言模型提供了强大的先验知识，但它们在长程结构和符号落地方面存在不足。现有将大语言模型与符号规划结合的方法通常依赖手工构建或狭窄的领域定义，限制了泛化能力。我们提出UniDomain框架，该框架从机器人操作演示中预训练PDDL领域，并将其应用于在线机器人任务规划。该方法从12,393个操作视频中提取原子领域，构建出包含3137个操作符、2875个谓词和16481条因果边的统一领域。针对特定任务类别，系统从统一领域中检索相关原子领域，并通过系统化融合形成高质量元领域，以支持规划中的组合泛化。在多样化真实任务上的实验表明，UniDomain能够以零样本方式解决复杂未见任务，相比最先进的大语言模型及大语言模型-PDDL基线方法，任务成功率提升最高达58%，规划最优性提升160%。