Large language models enable flexible multi-agent planning but remain fragile in practice: verification is often circular, state changes are not tracked for repair, and small faults trigger costly global recomputation. We present ALAS, a stateful, disruption-aware framework that separates planning from non-circular validation, records a versioned execution log for grounded checks and restore points, and performs localized repair that preserves work in progress. The validator operates independently of the planning LLM with fresh, bounded context, avoiding self-check loops and mid-context attrition. The repair protocol edits only the minimal affected region under explicit policies (retry, catch, timeout, backoff, idempotency keys, compensation, loop guards) defined in a canonical workflow IR that maps to Amazon States Language and Argo Workflows. On job-shop scheduling suites (DMU, TA) across five classical benchmarks, ALAS matches or exceeds strong single-LLM and multi-agent baselines, achieving 83.7% success, reducing token usage by 60%, and running 1.82times faster under comparable settings. A minimal reliability study shows that the validator detects injected structural faults with low overhead, and that localized repair contains runtime perturbations with a bounded edit radius and less makespan degradation than global recompute. Results indicate that the combination of validator isolation, versioned execution logs, and localized repair provides measurable efficiency, feasibility, and scalability for multi-agent LLM planning. Code and seeds will be released.
翻译:大语言模型支持灵活的多智能体规划,但在实践中仍显脆弱:验证常陷入循环,状态变更未跟踪以支持修复,微小故障会触发代价高昂的全局重计算。本文提出ALAS,一种具备状态感知与中断感知的框架,它将规划与非循环验证分离,记录版本化执行日志以支持基于事实的检查与恢复点,并执行局部化修复以保护进行中的工作。验证器独立于规划大语言模型运行,使用新鲜且有限的上下文,避免了自检循环与上下文中途衰减。修复协议仅编辑受影响的最小区域,遵循在规范化工作流中间表示(可映射至Amazon States Language和Argo Workflows)中明确定义的策略(重试、捕获、超时、退避、幂等键、补偿、循环防护)。在涵盖五个经典基准的作业车间调度测试集(DMU、TA)上,ALAS达到或超越了强单一大语言模型及多智能体基线,实现了83.7%的成功率,令牌使用量减少60%,在可比设置下运行速度提升1.82倍。一项最小可靠性研究表明,验证器能以低开销检测注入的结构性故障,且局部化修复能通过有限的编辑半径控制运行时扰动,相比全局重计算,其完工时间劣化更小。结果表明,验证器隔离、版本化执行日志与局部化修复的结合,为多智能体大语言模型规划带来了可衡量的效率、可行性与可扩展性提升。代码与测试种子将公开发布。