Machine learning algorithms are often repeatedly applied to problems with similar structure over and over again. We focus on solving a sequence of bandit optimization tasks and develop LIBO, an algorithm which adapts to the environment by learning from past experience and becomes more sample-efficient in the process. We assume a kernelized structure where the kernel is unknown but shared across all tasks. LIBO sequentially meta-learns a kernel that approximates the true kernel and solves the incoming tasks with the latest kernel estimate. Our algorithm can be paired with any kernelized or linear bandit algorithm and guarantees oracle optimal performance, meaning that as more tasks are solved, the regret of LIBO on each task converges to the regret of the bandit algorithm with oracle knowledge of the true kernel. Naturally, if paired with a sublinear bandit algorithm, LIBO yields a sublinear lifelong regret. We also show that direct access to the data from each task is not necessary for attaining sublinear regret. We propose F-LIBO, which solves the lifelong problem in a federated manner.
翻译:机器学习算法经常被反复地应用于类似结构的问题。 我们侧重于解决一系列土匪优化任务,并开发LIBO,这是一种通过学习过去的经验适应环境的算法,在这一过程中更具样本效率。 我们假设了一个内分层结构,其中内核未知,但所有任务都共享。 LIBO的相继元流内核是一个内核,它接近真实的内核,用最新的内核估计解决即将到来的任务。我们的算法可以与任何内分层或线形的土匪算法和保证最优性能相匹配,这意味着随着更多的任务得到解决,LIBO对每项任务的遗憾会与对真正内核的奥克莱知识的遗憾汇合在一起。自然,如果与亚线性内核算算法结合,LIBO将产生一种亚线性终身遗憾。我们还表明,直接接触每项任务的数据对于获得亚线性悔是不必要的。我们建议F-LIBO,它以联式的方式解决终身问题。