Anderson加速协调下降 (Anderson acceleration of coordinate descent)

Acceleration of first order methods is mainly obtained via inertial techniques \`a la Nesterov, or via nonlinear extrapolation. The latter has known a recent surge of interest, with successful applications to gradient and proximal gradient techniques. On multiple Machine Learning problems, coordinate descent achieves performance significantly superior to full-gradient methods. Speeding up coordinate descent in practice is not easy: inertially accelerated versions of coordinate descent are theoretically accelerated, but might not always lead to practical speed-ups. We propose an accelerated version of coordinate descent using extrapolation, showing considerable speed up in practice, compared to inertial accelerated coordinate descent and extrapolated (proximal) gradient descent. Experiments on least squares, Lasso, elastic net and logistic regression validate the approach.

翻译：加速第一顺序方法主要通过惯性技术( ⁇ a la Nesterov)或非线性外推法获得,后者最近发现兴趣激增,成功地应用了梯度和近似梯度技术。在多机学习问题上,协调世系的性能明显优于完全梯度方法。在实际中加快协调世系并非易事:惯性加速型协调世系在理论上加速,但不一定总能导致实际加速。我们提议采用外推法加速协调世系的加速版,与惯性加速协调世系和外推(精度)梯度下降相比,在实际中显示相当快的速度。在最小方的实验、拉索、弹性网和物流回归实验证实了这一方法。

相关内容

坐标下降

关注 0

坐标下降法（coordinate descent）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行一维搜索以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过主成分分析获得一个坐标间尽可能不相互关联的新坐标系.

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

专知会员服务

22+阅读 · 2021年4月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日