Large Language Model (LLM) agents are increasingly studied in multi-turn, multi-agent scenarios, yet most existing setups emphasize open-ended role-play rather than controlled evaluation. We introduce AsymPuzl, a minimal but expressive two-agent puzzle environment designed to isolate communication under information asymmetry. Each agent observes complementary but incomplete views of a symbolic puzzle and must exchange messages to solve it cooperatively. Using a diverse set of current-generation and open-source LLMs, we show that (i) strong models such as GPT-5 and Claude-4.0 reliably converge across puzzle sizes on the solution by sharing complete information in two turns, (ii) weaker models often ignore partner messages or over-correct their hypotheses, and (iii) feedback design is non-trivial: simple self-feedback improves success rates, while detailed joint feedback can hurt performance. These findings show that even in simple cooperative tasks, LLM communication strategies diverge and depend on the granularity of feedback signals. AsymPuzl thus provides a testbed for probing the limits of multi-turn cooperation and opens avenues for studying coordination mechanisms.
翻译:大语言模型(LLM)智能体在多轮次、多智能体场景中的研究日益增多,但现有设置大多侧重于开放式角色扮演,而非受控评估。我们提出了AsymPuzl,一个极简但表达能力强的双智能体谜题环境,旨在隔离信息不对称下的通信。每个智能体观察到一个符号谜题的互补但不完整的视图,必须通过交换消息来协作解决它。通过使用一组多样化的当前一代和开源LLM,我们发现:(i)如GPT-5和Claude-4.0等强大模型能够通过在两轮内共享完整信息,可靠地在不同谜题规模上收敛到解决方案;(ii)较弱模型常常忽略伙伴消息或过度修正其假设;(iii)反馈设计并非无关紧要:简单的自我反馈提高了成功率,而详细的联合反馈可能损害性能。这些发现表明,即使在简单的协作任务中,LLM的通信策略也存在差异,并依赖于反馈信号的粒度。因此,AsymPuzl为探究多轮协作的极限提供了一个测试平台,并为研究协调机制开辟了途径。