We present a novel optimizer for deep neural networks that combines the ideas of Netwon's method and line search to efficiently compute and utilize curvature information. Our work is based on empirical observation suggesting that the loss function can be approximated by a parabola in negative gradient direction. Due to this approximation, we are able to perform a variable and loss function dependent parameter update by jumping directly into the minimum of the approximated parabola. To evaluate our optimizer, we performed multiple comprehensive hyperparameter grid searches for which we trained more than 20000 networks in total. We can show that PAL outperforms RMSPROP, and can outperform gradient descent with momentum and ADAM on large-scale high-dimensional machine learning problems. Furthermore, PAL requires up to 52.2% less training epochs. PyTorch and TensorFlow implementations are provided at https://github.com/cogsys-tuebingen/PAL.
翻译:我们为深神经网络展示了一种新型的优化方法,将Netwon的方法和线搜索的概念结合起来,以便有效地计算和利用曲线信息。我们的工作基于经验观测,表明损失函数可以用负梯度方向的抛物线近似于负梯度方向的抛物线。由于这一近似,我们能够通过直接跳到近似抛物线的最小值来更新一个可变和损失函数依赖参数。为了评估我们的优化,我们进行了多重综合超参数网格搜索,我们共培训了2 000多个网络。我们可以显示PAL优于RMSPROP,并且能够以动力超过梯度下降和在大型高度机器学习问题上的自动自动协调机制。此外,PAL需要最多52.2%的培训范围小于足虫。PyTorrch和TensorFlow的落实情况可在https://github.com/cogsys-tuebingen/PAL上提供。