Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability. We analyse four decision-making policies-Explore-then-Commit, Action Elimination, Lower Confidence Bound (LCB), and Thompson Sampling-and provide sharp theoretical regret guarantees for each. Unlike in classical bandit settings, Explore-then-Commit and Action Elimination incur suboptimal regret because they commit to a fixed ordering after the exploration phase, limiting their ability to adapt. In contrast, LCB and Thompson Sampling continuously update their decisions based on observed feedback, achieving constant O(1) regret. Simulations corroborate these theoretical findings, highlighting the crucial role of adaptivity for efficient edge inference under uncertainty.
翻译:受边缘推理挑战的启发,我们研究了一种级联赌博机模型的变体,其中每个臂对应一个具有相关准确度和错误概率的推理模型。我们分析了四种决策策略——探索后提交(Explore-then-Commit)、行动消除(Action Elimination)、下置信界(Lower Confidence Bound, LCB)和汤普森采样(Thompson Sampling)——并为每种策略提供了严格的理论遗憾保证。与经典赌博机设置不同,探索后提交和行动消除由于在探索阶段后固定了排序而无法充分适应,导致次优遗憾。相比之下,LCB和汤普森采样基于观测到的反馈持续更新决策,实现了常数O(1)遗憾。仿真结果验证了这些理论发现,突显了在不确定性下适应能力对于高效边缘推理的关键作用。