Embodied agents tasked with complex scenarios, whether in real or simulated environments, rely heavily on robust planning capabilities. When instructions are formulated in natural language, large language models (LLMs) equipped with extensive linguistic knowledge can play this role. However, to effectively exploit the ability of such models to handle linguistic ambiguity, to retrieve information from the environment, and to be based on the available skills of an agent, an appropriate architecture must be designed. We propose a Hierarchical Embodied Language Planner, called HELP, consisting of a set of LLM-based agents, each dedicated to solving a different subtask. We evaluate the proposed approach on a household task and perform real-world experiments with an embodied agent. We also focus on the use of open source LLMs with a relatively small number of parameters, to enable autonomous deployment.
翻译:在真实或模拟环境中执行复杂场景任务的具身智能体,其成功运作高度依赖于强大的规划能力。当任务指令以自然语言形式给出时,具备丰富语言知识的大型语言模型(LLMs)可承担这一规划角色。然而,为有效利用此类模型处理语言歧义、从环境中检索信息以及基于智能体现有技能的能力,必须设计合适的架构。我们提出一种名为HELP的层次化具身语言规划器,它由一组基于LLM的智能体构成,每个智能体专门负责解决不同的子任务。我们在家庭任务场景中对所提方法进行评估,并通过具身智能体进行了真实世界实验。我们还重点关注使用参数量相对较小的开源LLMs,以实现自主部署。