最佳快速方法的异差衍生法 (A Discrete Variational Derivation of Accelerated Methods in Optimization)

Many of the new developments in machine learning are connected with gradient-based optimization methods. Recently, these methods have been studied using a variational perspective. This has opened up the possibility of introducing variational and symplectic integration methods using geometric integrators. In particular, in this paper, we introduce variational integrators which allow us to derive different methods for optimization. Using both, Hamilton's principle and Lagrange-d'Alembert's, we derive two families of optimization methods in one-to-one correspondence that generalize Polyak's heavy ball and the well known Nesterov accelerated gradient method, mimicking the behavior of the latter which reduces the oscillations of typical momentum methods. However, since the systems considered are explicitly time-dependent, the preservation of symplecticity of autonomous systems occurs here solely on the fibers. Several experiments exemplify the result.

翻译：机器学习的许多新发展都与基于梯度的优化方法有关。最近,这些方法已经用变异角度进行了研究。这打开了采用几何集成器采用变异和随机集成法的可能性。特别是, 在本文中, 我们引入了变异集成器, 使我们能够得出不同的优化方法。使用汉密尔顿原则和 Lagrange- d' Alembert 的两种方法, 我们从一对一的通信中得出两种优化方法的组合, 将波里雅克的重球和众所周知的Nesterov加速梯度法一对一, 模拟后者的行为, 减少典型动力方法的振动。然而, 由于所考虑的系统明显取决于时间, 维护自主系统的随机性只发生于纤维上。一些实验将结果举例化。