在训练深神经网络时,通过RELU激活固定目标功能,对深神经网络进行培训时的随机梯度梯度下降的一致证据 (Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions)

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation. We first establish general regularity properties for the risk functions and their generalized gradient functions appearing in the training of such DNNs and, thereafter, we investigate the plain vanilla SGD optimization method in the training of such DNNs under the assumption that the target function under consideration is a constant function. Specifically, we prove under the assumption that the learning rates (the step sizes of the SGD optimization method) are sufficiently small but not $L^1$-summable and under the assumption that the target function is a constant function that the expectation of the riskof the considered SGD process converges in the training of such DNNs to zero as the number of SGD steps increases to infinity.

翻译：在许多数字模拟梯度下降(SGD)类型的优化方法中,在深神经网络(DNNs)的培训中非常有效地发挥了作用,但直到今天为止,在提供数学趋同分析以严格解释SGD类型优化方法在培训DNs方面成功与否方面,这是一个尚未解决的研究问题,在这项工作中,我们在培训完全连接的Feedforward DNs时研究SGD类型优化方法,使用纠正的线性单位(RELU)激活。我们首先为风险功能及其在培训这类DNs时出现的普遍梯度功能确定一般常态特性,随后,我们在培训这类DNS时,在假定所考虑的目标功能为常态功能的情况下,对普通香草SGD优化方法进行调查。具体地说,我们证明,假设学习率(SGD优化方法的阶梯大小)足够小,但不能达到$L%1美元的总和。我们假设,目标功能是一个不变的功能,即所考虑的SGD进程的风险预期在培训中会与这种DNes的零相匹配,作为SGDGDSGT步骤的数量增加。