Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-armed bandit models (Russo, 2016), for parametric families of arms. They select the next arm to sample from by randomizing among two candidate arms, a leader and a challenger. Despite their good empirical performance, theoretical guarantees for fixed-confidence best arm identification have only been obtained when the arms are Gaussian with known variances. In this paper, we provide a general analysis of Top Two methods, which identifies desirable properties of the leader, the challenger, and the (possibly non-parametric) distributions of the arms. As a result, we obtain theoretically supported Top Two algorithms for best arm identification with bounded distributions. Our proof method demonstrates in particular that the sampling step used to select the leader inherited from Thompson sampling can be replaced by other choices, like selecting the empirical best arm.
翻译:顶层两种算法的产生是为了将Thompson抽样方法改制成多武装匪徒模式(Russo,2016年)中武器准体型的最佳武器识别方法(Russo,2016年),它们通过在两个候选武器、一个头目和一个挑战者之间随机排列,从中选择下一个手臂样本。尽管它们表现良好,但只有在武器为Gaussian且有已知差异的情况下,才能获得对固定信心最佳武器识别的理论保证。在本文中,我们提供了对顶层二方法的一般分析,这些方法确定了领导人、挑战者以及(可能非参数)武器分布的适宜性能。结果,我们从理论上获得了支持的顶层二算法,用捆绑的分布进行最佳武器识别。我们的证据方法特别表明,选择Thompson取样所继承的领导人所用的取样步骤可以被其他选择取代,比如选择经验最好的手臂。