谁获奖励，谁担责任？面向多LLM智能体的评估对齐训练信号 (Who Gets the Reward, Who Gets the Blame? Evaluation-Aligned Training Signals for Multi-LLM Agents)

Large Language Models (LLMs) in multi-agent systems (MAS) have shown promise for complex tasks, yet current training methods lack principled ways to connect system-level evaluation with agent-level and message-level learning. We propose a theoretical framework that unifies cooperative game-theoretic attribution with process reward modeling to transform system evaluation into agent credit and then into response-level signals. Unlike prior approaches that rely only on attribution (e.g., Shapley) or step-level labels (e.g., PRM), our method produces local, signed, and credit-conserving signals. In success cases, Shapley-based credit assignment fairly allocates outcomes across agents and is refined into per-message rewards that promote cooperation while discouraging redundancy or sabotage. In failure cases, first-error localization yields repair-aware preferences that penalize harmful steps while rewarding corrective attempts. The resulting signals are bounded, cooperative, and directly compatible with reinforcement-based or preference-based post-training, providing a unified and auditable pathway from global evaluation to local supervision in LLM multi-agent training. Our contribution is conceptual: we present a theoretical foundation and training signals, leaving empirical validation for future work.

翻译：多智能体系统（MAS）中的大语言模型（LLM）在复杂任务中展现出潜力，然而当前训练方法缺乏将系统级评估与智能体级及消息级学习相连接的原则性途径。我们提出一个理论框架，将合作博弈论归因与过程奖励建模相统一，从而将系统评估转化为智能体信用，再进一步转化为响应级信号。与先前仅依赖归因（如Shapley值）或步骤级标签（如PRM）的方法不同，我们的方法生成局部、有符号且信用守恒的信号。在成功案例中，基于Shapley值的信用分配公平地在各智能体间分配结果，并细化为每条消息的奖励，以促进合作同时抑制冗余或破坏行为。在失败案例中，首次错误定位产生具备修复意识的偏好，惩罚有害步骤同时奖励纠正尝试。所得信号具有有界性、合作性，并可直接与基于强化学习或偏好的后训练兼容，为LLM多智能体训练提供了一条从全局评估到局部监督的统一且可审计的路径。我们的贡献是概念性的：我们提出了理论基础与训练信号，实证验证留待未来工作。