具有全球趋同保障的神经网络中的特色学习 (On Feature Learning in Neural Networks with Global Convergence Guarantees)

We study the optimization of wide neural networks (NNs) via gradient flow (GF) in setups that allow feature learning while admitting non-asymptotic global convergence guarantees. First, for wide shallow NNs under the mean-field scaling and with a general class of activation functions, we prove that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. Building upon this analysis, we study a model of wide multi-layer NNs whose second-to-last layer is trained via GF, for which we also prove a linear-rate convergence of the training loss to zero, but regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.

翻译：我们研究通过梯度流优化宽度神经网络(NNs)的设置,这种设置既允许特征学习,又承认非非无线性全球趋同保证。首先,对于在中野规模和一般启动功能类别下宽度浅度的NNS, 我们证明,当输入层面不小于培训组合的规模时,在GF下,培训损失以线性速度上升到零。在此基础上,我们研究一个通过GF培训其第二至最后一层的广泛多层多层NTNS的模式,为此,我们也证明培训损失的线性趋同为零,但无论投入层面如何。我们还从经验上表明,与Neural Tangent Kernel(NTK)制度不同,我们的多层模型展示以学习为特征,并能够实现比NTK对口单位更好的普及性表现。

相关内容

表征学习

关注 151

在机器学习中，表征学习或表示学习是允许系统从原始数据中自动发现特征检测或分类所需的表示的一组技术。这取代了手动特征工程，并允许机器学习特征并使用它们执行特定任务。在有监督的表征学习中，使用标记的输入数据来学习特征，包括监督神经网络，多层感知器和（监督）字典学习。在无监督表征学习中，特征是与未标记的输入数据一起学习的，包括字典学习，独立成分分析，自动编码器，矩阵分解和各种形式的聚类。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日