Statistical inference from data generated by multi-armed bandit (MAB) algorithms is challenging due to their adaptive, non-i.i.d. nature. A classical manifestation is that sample averages of arm rewards under bandit sampling may fail to satisfy a central limit theorem. Lai and Wei's stability condition provides a sufficient, and essentially necessary criterion, for asymptotic normality in bandit problems. While the celebrated Upper Confidence Bound (UCB) algorithm satisfies this stability condition, it is not minimax optimal, raising the question of whether minimax optimality and statistical stability can be achieved simultaneously. In this paper, we analyze the stability properties of a broad class of bandit algorithms that are based on the optimism principle. We establish general structural conditions under which such algorithms violate the Lai-Wei stability criterion. As a consequence, we show that widely used minimax-optimal UCB-style algorithms, including MOSS, Anytime-MOSS, Vanilla-MOSS, ADA-UCB, OC-UCB, KL-MOSS, KL-UCB++, KL-UCB-SWITCH, and Anytime KL-UCB-SWITCH, are unstable. We further complement our theoretical results with numerical simulations demonstrating that, in all these cases, the sample means fail to exhibit asymptotic normality. Overall, our findings suggest a fundamental tension between stability and minimax optimal regret, raising the question of whether it is possible to design bandit algorithms that achieve both. Understanding whether such simultaneously stable and minimax optimal strategies exist remains an important open direction.
翻译:从多臂赌博机算法生成的数据中进行统计推断具有挑战性,这源于其自适应、非独立同分布的特性。一个经典的表现是,在赌博机采样下,臂奖励的样本均值可能无法满足中心极限定理。Lai和Wei的稳定性条件为赌博机问题中的渐近正态性提供了充分且本质上必要的判据。虽然著名的上置信界算法满足该稳定性条件,但它并非极小极大最优,这引发了极小极大最优性与统计稳定性能否同时实现的疑问。本文分析了基于乐观原则的广泛赌博机算法类的稳定性特性。我们建立了此类算法违反Lai-Wei稳定性判据的一般结构条件。由此证明,广泛使用的极小极大最优UCB类算法(包括MOSS、Anytime-MOSS、Vanilla-MOSS、ADA-UCB、OC-UCB、KL-MOSS、KL-UCB++、KL-UCB-SWITCH和Anytime KL-UCB-SWITCH)均不稳定。我们进一步通过数值模拟补充理论结果,表明在所有情况下样本均值均未呈现渐近正态性。总体而言,我们的发现揭示了稳定性与极小极大最优遗憾之间的根本性矛盾,提出了能否设计同时实现两者的赌博机算法的问题。理解这种同时具备稳定性和极小极大最优性的策略是否存在,仍然是一个重要的开放研究方向。