LLM-based agents are increasingly deployed for expert decision support, yet human-AI teams in high-stakes settings do not yet reliably outperform the best individual. We argue this complementarity gap reflects a fundamental mismatch: current agents are trained as answer engines, not as partners in the collaborative sensemaking through which experts actually make decisions. Sensemaking (the ability to co-construct causal explanations, surface uncertainties, and adapt goals) is the key capability that current training pipelines do not explicitly develop or evaluate. We propose Collaborative Causal Sensemaking (CCS) as a research agenda to develop this capability from the ground up, spanning new training environments that reward collaborative thinking, representations for shared human-AI mental models, and evaluation centred on trust and complementarity. These directions can advance MAS research toward agents that think with their human partners rather than for them.
翻译:基于大语言模型(LLM)的智能体正日益被部署用于专家决策支持,然而在高风险场景中,人机团队的表现尚未稳定超越最佳个体。我们认为,这一互补性差距反映了一个根本性不匹配:当前智能体被训练为答案生成引擎,而非作为专家实际决策过程中协同意义建构的合作伙伴。意义建构(即共同构建因果解释、揭示不确定性并调整目标的能力)是当前训练流程未能明确开发或评估的关键能力。我们提出协同因果意义建构(CCS)作为一项研究议程,旨在从头发展这一能力,涵盖奖励协同思维的新训练环境、共享人机心智模型的表征方法,以及以信任和互补性为核心的评价体系。这些方向可推动多智能体系统(MAS)研究朝着智能体与其人类伙伴共同思考而非替代思考的方向发展。