Algorithmic collusion has emerged as a central question in AI: Will the interaction between different AI agents deployed in markets lead to collusion? More generally, understanding how emergent behavior, be it a cartel or market dominance from more advanced bots, affects the market overall is an important research question. We propose a hierarchical multi-agent reinforcement learning framework to study algorithmic collusion in market making. The framework includes a self-interested market maker (Agent~A), which is trained in an uncertain environment shaped by an adversary, and three bottom-layer competitors: the self-interested Agent~B1 (whose objective is to maximize its own PnL), the competitive Agent~B2 (whose objective is to minimize the PnL of its opponent), and the hybrid Agent~B$^\star$, which can modulate between the behavior of the other two. To analyze how these agents shape the behavior of each other and affect market outcomes, we propose interaction-level metrics that quantify behavioral asymmetry and system-level dynamics, while providing signals potentially indicative of emergent interaction patterns. Experimental results show that Agent~B2 secures dominant performance in a zero-sum setting against B1, aggressively capturing order flow while tightening average spreads, thus improving market execution efficiency. In contrast, Agent~B$^\star$ exhibits a self-interested inclination when co-existing with other profit-seeking agents, securing dominant market share through adaptive quoting, yet exerting a milder adverse impact on the rewards of Agents~A and B1 compared to B2. These findings suggest that adaptive incentive control supports more sustainable strategic co-existence in heterogeneous agent environments and offers a structured lens for evaluating behavioral design in algorithmic trading systems.
翻译:算法合谋已成为人工智能领域的核心问题:市场中部署的不同AI智能体之间的交互是否会导致合谋?更广泛而言,理解涌现行为(无论是卡特尔还是更先进机器人形成的市场主导)如何影响整体市场,是一个重要的研究课题。我们提出了一种分层多智能体强化学习框架来研究造市中的算法合谋。该框架包含一个自利的造市商(智能体A),其在由对手塑造的不确定环境中进行训练,以及三个底层竞争者:自利型智能体B1(其目标是最大化自身盈亏)、竞争型智能体B2(其目标是最小化对手的盈亏)和混合型智能体B$^\\star$(可在前两种行为模式间调节)。为分析这些智能体如何相互影响行为并作用于市场结果,我们提出了交互层面的度量指标,用于量化行为不对称性和系统级动态,同时提供可能指示涌现交互模式的信号。实验结果表明,在零和设定下,智能体B2相对于B1获得主导性能,通过积极捕获订单流并收窄平均价差,从而提升市场执行效率。相比之下,当与其他逐利智能体共存时,智能体B$^\\star$表现出自利倾向,通过自适应报价获取主导市场份额,但与B2相比,其对智能体A和B1的收益产生更温和的负面影响。这些发现表明,自适应激励控制在异构智能体环境中支持更可持续的战略共存,并为评估算法交易系统中的行为设计提供了结构化视角。