Stochastic-gradient-based optimization has been a core enabling methodology in applications to large-scale problems in machine learning and related areas. Despite the progress, the gap between theory and practice remains significant, with theoreticians pursuing mathematical optimality at a cost of obtaining specialized procedures in different regimes (e.g., modulus of strong convexity, magnitude of target accuracy, signal-to-noise ratio), and with practitioners not readily able to know which regime is appropriate to their problem, and seeking broadly applicable algorithms that are reasonably close to optimality. To bridge these perspectives it is necessary to study algorithms that are adaptive to different regimes. We present the stochastically controlled stochastic gradient (SCSG) method for composite convex finite-sum optimization problems and show that SCSG is adaptive to both strong convexity and target accuracy. The adaptivity is achieved by batch variance reduction with adaptive batch sizes and a novel technique, which we referred to as geometrization, which sets the length of each epoch as a geometric random variable. The algorithm achieves strictly better theoretical complexity than other existing adaptive algorithms, while the tuning parameters of the algorithm only depend on the smoothness parameter of the objective.
翻译:尽管取得了这些进展,但理论和实践之间的差距仍然很大,理论学家追求数学的最佳性,其代价是在不同制度中获得专门程序(例如,高度相近的模量、目标精确度、信号到噪音比率),以及从业人员无法随时了解哪种制度适合他们的问题,并寻求合理接近于最佳性的可广泛应用的算法。为了弥合这些观点,有必要研究适应不同制度的算法。我们提出了综合convex有限和最优化问题的随机控制梯度方法(SCSG),并表明SCSG既适应强的相近性,又适应目标准确性。适应性是通过分批量尺寸的分批变异和我们称之为几何调的新技术来实现的。我们称之为几何相近的算法,将每种偏差的长度定成一个几何随机变量。算法只能在理论上比其他现有适应性算法更复杂得多的参数上实现精确的理论复杂性,同时调整了现有平稳的参数。