关于高山进程大土匪的 " 信息增益 " 和 " 遗憾 " 区 (On Information Gain and Regret Bounds in Gaussian Process Bandits)

Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain $\gamma_T$ between $T$ observations and the underlying GP (surrogate) model. We provide general bounds on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $\gamma_T$, and subsequently the regret bounds relying on $\gamma_T$ under numerous settings. For the Mat\'ern family of kernels, where the lower bounds on $\gamma_T$, and regret under the frequentist setting, are known, our results close a huge polynomial in $T$ gap between the upper and lower bounds (up to logarithmic in $T$ factors).

翻译：考虑一个昂贵的顺序优化, 以评价为代价, 并可能非convex 目标设置。来自噪音反馈的美元, 这可以被视为连续式强盗问题。一些学习算法( GP- UCB、 GP- TS 及其变体) 的遗憾表现的上限, 以拜叶( 美元是高山进程( GP) 的样本) 和常客( 美元是 $\ gamma_ Ta美元生活在复制的Hilbert 空间 ) 设置为名下。遗憾界限往往依赖于最大信息在美元观测和基本GP( GP) 模型( GP- CUB、 GP- G- T 及其变异体) 之间获得的$\ gamma_ T$ 。我们根据GP 内核内核的精度衰减率提供$\ gamma_ T 的一般界限( $ gamma_ t), 也就是在多个环境中的 $ 和美元以美元内的磁带的。