Hyperparameter tuning is a bothersome step in the training of deep learning models. One of the most sensitive hyperparameters is the learning rate of the gradient descent. We present the 'All Learning Rates At Once' (Alrao) optimization method for neural networks: each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. This comes at practically no computational cost. Perhaps surprisingly, stochastic gradient descent (SGD) with Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems. Alrao could save time when testing deep learning models: a range of models could be quickly assessed with Alrao, and the most promising models could then be trained more extensively. This text comes with a PyTorch implementation of the method, which can be plugged on an existing PyTorch model: https://github.com/leonardblier/alrao .
翻译:超强参数调试是深学习模型培训的一个不便步骤。 最敏感的超强参数之一是梯度下降的学习率。 我们为神经网络展示了“ 一次性全部学习率”( Alrao) 优化方法: 网络中每个单元或功能都通过一个分布范围不同程度的随机分布进行抽样学习。 这实际上没有计算成本。 也许令人惊讶的是, Alrao 与 Alrao 的随机切分梯度下降(SGD) 在各种建筑和问题的学习率方面,与 SGD 相近,具有最佳调整的学习率。 Alrao 在测试深层学习模型时可以节省时间: 一系列模型可以与 Alrao 快速评估, 然后最有希望的模式可以接受更广泛的培训。 这个文本是采用PyTorch 方法, 可以在现有的PyTorch模型上插入 : https://github.com/leonardblier/alrao 。