Arguably, the two most popular accelerated or momentum-based optimization methods in machine learning are Nesterov's accelerated gradient and Polyaks's heavy ball, both corresponding to different discretizations of a particular second order differential equation with friction. Such connections with continuous-time dynamical systems have been instrumental in demystifying acceleration phenomena in optimization. Here we study structure-preserving discretizations for a certain class of dissipative (conformal) Hamiltonian systems, allowing us to analyze the symplectic structure of both Nesterov and heavy ball, besides providing several new insights into these methods. Moreover, we propose a new algorithm based on a dissipative relativistic system that normalizes the momentum and may result in more stable/faster optimization. Importantly, such a method generalizes both Nesterov and heavy ball, each being recovered as distinct limiting cases, and has potential advantages at no additional cost.
翻译:可以说,在机器学习中最受欢迎的两种加速或动力优化方法是Nesterov的加速梯度和Polyaks的重球,这两种方法都对应于与摩擦不同的二阶差异方程式的不同分解。这种与连续时动态系统的联系有助于在优化中解开加速现象的神秘化。在这里,我们研究为某类消散(正规)汉密尔顿系统保持结构分解的方法,使我们能够分析Nesterov和重球的反射结构,同时提供对这些方法的几种新见解。此外,我们提出了基于消散式相对主义系统的新算法,使动力正常化,并可能导致更稳定/更快的优化。重要的是,这样一种方法将Nesterov和重球都作为独特的限制案例加以回收,并且不增加成本,具有潜在的优势。