Stochastic gradient-based optimization methods, such as L-SVRG and its accelerated variant L-Katyusha [12], are widely used to train machine learning models. Theoretical and empirical performance of L-SVRG and L-Katyusha can be improved by sampling the observations from a non-uniform distribution [17]. However, to design a desired sampling distribution, Qian et al.[17] rely on prior knowledge of smoothness constants that can be computationally intractable to obtain in practice when the dimension of the model parameter is high. We propose an adaptive sampling strategy for L-SVRG and L-Katyusha that learns the sampling distribution with little computational overhead, while allowing it to change with iterates, and at the same time does not require any prior knowledge on the problem parameters. We prove convergence guarantees for L-SVRG and L-Katyusha for convex objectives when the sampling distribution changes with iterates. These results show that even without prior information, the proposed adaptive sampling strategy matches, and in some cases even surpasses, the performance of the sampling scheme in Qian et al.[17]. Extensive simulations support our theory and the practical utility of the proposed sampling scheme on real data.
翻译:L-SVRG及其加速变体L-Katyusha[12]等基于悬浮梯度的优化方法被广泛用于培训机器学习模型。L-SVRG和L-Katyusha的理论和经验表现可以通过对非统一分布[17]的观测进行抽样抽样来改进。然而,为了设计一个理想的抽样分布,Qian等人[17] 依赖事先对光滑常数的了解,这些常数在模型参数的高度时,在实际操作中可以计算难以掌握。我们建议L-SVRG和L-Katyusha采用适应性取样战略,在利用少量计算间接费用学习抽样分布,同时允许它与迭代方发生变化,同时不需要事先对问题参数有任何了解。我们证明L-SVRG和L-Katyusha在取样分布变化时,会为连接目标提供一致保证。这些结果显示,即使没有事先信息,拟议的适应性取样战略也比甚至超过在实际效用模型中支持我们所拟议的Qian etly数据模拟计划的业绩。[17]。