HELP：面向家庭任务的层次化具身语言规划器 (HELP: Hierarchical Embodied Language Planner for Household Tasks) - 专知论文

会员服务 ·

0

智能体 · 具身智能体 · 具身智能 · 大语言模型 · 模拟环境 ·

HELP: Hierarchical Embodied Language Planner for Household Tasks

翻译：HELP：面向家庭任务的层次化具身语言规划器

Alexandr V. Korchemnyi,Anatoly O. Onishchenko,Eva A. Bakaeva,Alexey K. Kovalev,Aleksandr I. Panov

Embodied agents tasked with complex scenarios, whether in real or simulated environments, rely heavily on robust planning capabilities. When instructions are formulated in natural language, large language models (LLMs) equipped with extensive linguistic knowledge can play this role. However, to effectively exploit the ability of such models to handle linguistic ambiguity, to retrieve information from the environment, and to be based on the available skills of an agent, an appropriate architecture must be designed. We propose a Hierarchical Embodied Language Planner, called HELP, consisting of a set of LLM-based agents, each dedicated to solving a different subtask. We evaluate the proposed approach on a household task and perform real-world experiments with an embodied agent. We also focus on the use of open source LLMs with a relatively small number of parameters, to enable autonomous deployment.

翻译：在真实或模拟环境中执行复杂场景任务的具身智能体，其成功运作高度依赖于强大的规划能力。当任务指令以自然语言形式给出时，具备丰富语言知识的大型语言模型（LLMs）可承担这一规划角色。然而，为有效利用此类模型处理语言歧义、从环境中检索信息以及基于智能体现有技能的能力，必须设计合适的架构。我们提出一种名为HELP的层次化具身语言规划器，它由一组基于LLM的智能体构成，每个智能体专门负责解决不同的子任务。我们在家庭任务场景中对所提方法进行评估，并通过具身智能体进行了真实世界实验。我们还重点关注使用参数量相对较小的开源LLMs，以实现自主部署。

0

相关内容

智能体

智能体，顾名思义，就是具有智能的实体，英文名是Agent。

面向具身操作的高效视觉–语言–动作模型：系统综述

面向具身操作的高效视觉–语言–动作模型：系统综述

专知会员服务

21+阅读 · 10月22日

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

专知会员服务

15+阅读 · 8月5日

【CVPR2023】NS3D：3D对象和关系的神经符号Grounding

【CVPR2023】NS3D：3D对象和关系的神经符号Grounding

专知会员服务

22+阅读 · 2023年3月26日

美海军《表征军事领域的新奇性》开发和评估对新事物具有鲁棒性的智能体；DARPA人工智能科学和开放世界新事物学习（SAIL-ON）项目

美海军《表征军事领域的新奇性》开发和评估对新事物具有鲁棒性的智能体；DARPA人工智能科学和开放世界新事物学习（SAIL-ON）项目

专知会员服务

31+阅读 · 2023年3月1日

【Google AI】多模态瓶颈Transformer(MBT):一种新的模态融合模型，Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion

【Google AI】多模态瓶颈Transformer(MBT):一种新的模态融合模型，Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion

专知会员服务

57+阅读 · 2022年3月20日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

如何用机器学习精准辨别“背景”和“目标”

如何用机器学习精准辨别“背景”和“目标”

论智

10+阅读 · 2018年10月22日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

协同特征CAD中支持多用户意图融合的关键技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

CGF战场空间认知行为建模研究

国家自然科学基金

51+阅读 · 2014年12月31日

AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning

Arxiv

0+阅读 · 12月28日

iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception

Arxiv

0+阅读 · 12月26日

LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation

Arxiv

0+阅读 · 12月24日

TongSIM: A General Platform for Simulating Intelligent Machines

Arxiv

0+阅读 · 12月23日

AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models

Arxiv

0+阅读 · 12月22日

VIP会员

文章信息

相关主题

具身智能体

大语言模型

相关VIP内容

面向具身操作的高效视觉–语言–动作模型：系统综述

面向具身操作的高效视觉–语言–动作模型：系统综述

专知会员服务

21+阅读 · 10月22日

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

专知会员服务

15+阅读 · 8月5日

【CVPR2023】NS3D：3D对象和关系的神经符号Grounding

【CVPR2023】NS3D：3D对象和关系的神经符号Grounding

专知会员服务

22+阅读 · 2023年3月26日

美海军《表征军事领域的新奇性》开发和评估对新事物具有鲁棒性的智能体；DARPA人工智能科学和开放世界新事物学习（SAIL-ON）项目

美海军《表征军事领域的新奇性》开发和评估对新事物具有鲁棒性的智能体；DARPA人工智能科学和开放世界新事物学习（SAIL-ON）项目

专知会员服务

31+阅读 · 2023年3月1日

【Google AI】多模态瓶颈Transformer(MBT):一种新的模态融合模型，Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion

【Google AI】多模态瓶颈Transformer(MBT):一种新的模态融合模型，Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion

专知会员服务

57+阅读 · 2022年3月20日

热门VIP内容

开通专知VIP会员享更多权益服务

星链与未来战争

《黑蜂（Black Hummingbird）微型无人机》

《全球地缘政治环境中的反无人机系统互操作性》252页

《美国：为自动驾驶汽车铺平道路——未来出行已来》最新43页报告

相关资讯

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

如何用机器学习精准辨别“背景”和“目标”

如何用机器学习精准辨别“背景”和“目标”

论智

10+阅读 · 2018年10月22日

相关论文

AutoForge: Automated Environment Synthesis for Agentic Reinforcement Learning

Arxiv

0+阅读 · 12月28日

iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception

Arxiv

0+阅读 · 12月26日

LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation

Arxiv

0+阅读 · 12月24日

TongSIM: A General Platform for Simulating Intelligent Machines

Arxiv

0+阅读 · 12月23日

AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models

Arxiv

0+阅读 · 12月22日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

2D/3D视觉信息融合仿生SLAM关键问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

协同特征CAD中支持多用户意图融合的关键技术研究

国家自然科学基金

3+阅读 · 2015年12月31日

CGF战场空间认知行为建模研究

国家自然科学基金

51+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员