"Bandits with Knapsacks" (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider "simple regret" in BwK, which tracks the algorithm's performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a general template for extensions from bandits to BwK which takes advantage of some known helpful structure and apply this template to combinatorial semi-bandits and linear contextual bandits. Our results build on the BwK algorithm from (Agrawal and Devanur, 2014), providing new analyses thereof.
翻译:使用 Knapsacks (BwK) 的“ Bandits with Knapsacks” (BwK) 是多武装强盗在供应/预算限制下的一般模式。 虽然对 BwK 最坏的遗憾范围非常理解, 但我们展示了三种结果, 超越了最坏情况的角度。 首先, 我们提供上下界限, 相当于对数的完整描述, 取决于实例的遗憾率。 其次, 我们考虑 BwK 中的“ 简单遗憾 ”, 以跟踪算法在特定回合中的表现, 并证明它除了几轮之外都是小的。 第三, 我们为从强盗到 BwK 的扩展提供了一个一般模板, 利用一些已知的有用结构, 将这个模板用于组合半强盗和线性背景强盗。 我们的结果建立在 BwK 算法( Agrawal and Devanur, 2014) 的基础上, 提供了新的分析 。