We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of a variant of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.
翻译:我们研究持续状态和行动空间的强化学习。我们对辛克莱、贝纳杰和尤(2019年)的算法变体进行精细分析,并显示其带有此实例的 emph{zooming 维度的遗憾度。这个参数源于土匪文献,它捕捉了近乎最佳行动的子集的大小,并且总是小于先前分析中所使用的覆盖维度。因此,我们的结果是第一个在衡量空间加强学习的可证实的适应性保障。