用于后勤劫匪的实度- Wise Minimax- Optimal 算法 (Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits)

Logistic Bandits have recently attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits. It was shown by Faury et al. (2020) that the learning-theoretic difficulties of Logistic Bandits can be embodied by a large (sometimes prohibitively) problem-dependent constant $\kappa$, characterizing the magnitude of the reward's non-linearity. In this paper we introduce a novel algorithm for which we provide a refined analysis. This allows for a better characterization of the effect of non-linearity and yields improved problem-dependent guarantees. In most favorable cases this leads to a regret upper-bound scaling as $\tilde{\mathcal{O}}(d\sqrt{T/\kappa})$, which dramatically improves over the $\tilde{\mathcal{O}}(d\sqrt{T}+\kappa)$ state-of-the-art guarantees. We prove that this rate is minimax-optimal by deriving a $\Omega(d\sqrt{T/\kappa})$ problem-dependent lower-bound. Our analysis identifies two regimes (permanent and transitory) of the regret, which ultimately re-conciliates Faury et al. (2020) with the Bayesian approach of Dong et al. (2019). In contrast to previous works, we find that in the permanent regime non-linearity can dramatically ease the exploration-exploitation trade-off. While it also impacts the length of the transitory phase in a problem-dependent fashion, we show that this impact is mild in most reasonable configurations.

翻译：最近,后勤匪徒引起了大量关注,他们提供了一种不固定但却具有挑战性的框架,以了解非线性在防腐土匪中的影响。Faury等人(2020年)证明,后勤匪徒的学习理论困难可以用一个庞大(有时令人望而却步)的问题常数$\kappa美元来体现,这说明奖赏非线性的规模。在本文中,我们引入了一种新式算法,对此我们提供了更精确的分析。这样可以更好地描述非线性的影响,并产生更依赖问题的保证。在大多数情况下,这导致以美元(d\\\\conal{(d\qrt{T/\kappa})美元(d\qqapa})为主的上层缩缩缩缩缩缩,这比美元(d\qr>(sqrate)的“非线性平面性平价调”的幅度要大得多。我们最常态和最常态的“平面性交易”分析(我们最常态的“平面”的平时, 直径平平平-直径平的平-平-平-平-平-直平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-平-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-直-