We consider a multi-armed bandit problem specified by a set of one-dimensional family exponential distributions endowed with a unimodal structure. We introduce IMED-UB, a algorithm that optimally exploits the unimodal-structure, by adapting to this setting the Indexed Minimum Empirical Divergence (IMED) algorithm introduced by Honda and Takemura [2015]. Owing to our proof technique, we are able to provide a concise finite-time analysis of IMED-UB algorithm. Numerical experiments show that IMED-UB competes with the state-of-the-art algorithms.
翻译:我们考虑的是由一组单维家庭指数分布和单一模式结构所定义的多武装土匪问题。我们引入了IME-UB,这是一种优化利用单一模式结构的算法,它适应了Honda和Takemura[2015] 引入的指数化最低经验差异算法。由于我们的证明技术,我们能够对IME-UB算法进行简要的限定时间分析。数字实验显示IME-UB与最新算法竞争。