基于大型语言模型智能体的规划与控制基准：采用模型上下文协议的积木世界 (Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol) - 专知论文

会员服务 ·

0

基准 · 上下文 · 智能体 · 规划与控制 · 语言模型 ·

Benchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol

翻译：基于大型语言模型智能体的规划与控制基准：采用模型上下文协议的积木世界

Niklas Jobs,Luis Miguel Vieira da Silva,Jayanth Somashekaraiah,Maximilian Weigand,David Kube,Felix Gehlhoff

from arxiv, This work has been submitted to IFAC for possible publication

Industrial automation increasingly requires flexible control strategies that can adapt to changing tasks and environments. Agents based on Large Language Models (LLMs) offer potential for such adaptive planning and execution but lack standardized benchmarks for systematic comparison. We introduce a benchmark with an executable simulation environment representing the Blocksworld problem providing five complexity categories. By integrating the Model Context Protocol (MCP) as a standardized tool interface, diverse agent architectures can be connected to and evaluated against the benchmark without implementation-specific modifications. A single-agent implementation demonstrates the benchmark's applicability, establishing quantitative metrics for comparison of LLM-based planning and execution approaches.

翻译：工业自动化日益需要能够适应任务与环境变化的灵活控制策略。基于大型语言模型（LLM）的智能体为此类自适应规划与执行提供了潜力，但缺乏用于系统化比较的标准化基准。我们引入了一个包含可执行仿真环境的基准，该环境表示积木世界问题，并提供五个复杂度类别。通过集成模型上下文协议（MCP）作为标准化工具接口，多样化的智能体架构无需针对特定实现进行修改即可连接至该基准并进行评估。一个单智能体实现展示了该基准的适用性，并建立了用于比较基于LLM的规划与执行方法的量化指标。

0

相关内容

【ICML2024】上下文感知标记化的高效世界模型

【ICML2024】上下文感知标记化的高效世界模型

专知会员服务

29+阅读 · 2024年7月2日

美海军《表征军事领域的新奇性》开发和评估对新事物具有鲁棒性的智能体；DARPA人工智能科学和开放世界新事物学习（SAIL-ON）项目

美海军《表征军事领域的新奇性》开发和评估对新事物具有鲁棒性的智能体；DARPA人工智能科学和开放世界新事物学习（SAIL-ON）项目

专知会员服务

31+阅读 · 2023年3月1日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【NAACL2021】信息解缠正则化持续学习的文本分类

【NAACL2021】信息解缠正则化持续学习的文本分类

专知会员服务

22+阅读 · 2021年4月11日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

基于融合先验知识的机器学习的多传感器融合研究

国家自然科学基金

16+阅读 · 2013年12月31日

LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs

Arxiv

0+阅读 · 11月23日

MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory

Arxiv

0+阅读 · 11月20日

Equivalence Checking of ML GPU Kernels

Arxiv

0+阅读 · 11月18日

Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making

Arxiv

0+阅读 · 11月15日

Scaling Multi-Agent Environment Co-Design with Diffusion Models

Arxiv

0+阅读 · 11月5日

VIP会员

文章信息

相关主题

规划与控制

相关VIP内容

【ICML2024】上下文感知标记化的高效世界模型

【ICML2024】上下文感知标记化的高效世界模型

专知会员服务

29+阅读 · 2024年7月2日

美海军《表征军事领域的新奇性》开发和评估对新事物具有鲁棒性的智能体；DARPA人工智能科学和开放世界新事物学习（SAIL-ON）项目

美海军《表征军事领域的新奇性》开发和评估对新事物具有鲁棒性的智能体；DARPA人工智能科学和开放世界新事物学习（SAIL-ON）项目

专知会员服务

31+阅读 · 2023年3月1日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【NAACL2021】信息解缠正则化持续学习的文本分类

【NAACL2021】信息解缠正则化持续学习的文本分类

专知会员服务

22+阅读 · 2021年4月11日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

最新最全《深度元学习》2021综述论文，68页pdf，A Survey of Deep Meta-Learning

专知

11+阅读 · 2021年4月23日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

【CVPR2020-牛津-谷歌】语音到动作:动作识别的跨模态监督，Cross-modal Supervision

专知

10+阅读 · 2020年3月31日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

相关论文

LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs

Arxiv

0+阅读 · 11月23日

MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory

Arxiv

0+阅读 · 11月20日

Equivalence Checking of ML GPU Kernels

Arxiv

0+阅读 · 11月18日

Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making

Arxiv

0+阅读 · 11月15日

Scaling Multi-Agent Environment Co-Design with Diffusion Models

Arxiv

0+阅读 · 11月5日

相关基金

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

SDN数据平面中大规模流表的高性能查找方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

基于融合先验知识的机器学习的多传感器融合研究

国家自然科学基金

16+阅读 · 2013年12月31日

微信扫码咨询专知VIP会员