过参数化模型中随机高斯-牛顿法的非渐近优化与泛化界分析 (Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models) - 专知论文

会员服务 ·

0

泛化 · 参数化 · 分析 · 参数化模型 · 牛顿法 ·

Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

翻译：过参数化模型中随机高斯-牛顿法的非渐近优化与泛化界分析

An important question in deep learning is how higher-order optimization methods affect generalization. In this work, we analyze a stochastic Gauss-Newton (SGN) method with Levenberg-Marquardt damping and mini-batch sampling for training overparameterized deep neural networks with smooth activations in a regression setting. Our theoretical contributions are twofold. First, we establish finite-time convergence bounds via a variable-metric analysis in parameter space, with explicit dependencies on the batch size, network width and depth. Second, we derive non-asymptotic generalization bounds for SGN using uniform stability in the overparameterized regime, characterizing the impact of curvature, batch size, and overparameterization on generalization performance. Our theoretical results identify a favorable generalization regime for SGN in which a larger minimum eigenvalue of the Gauss-Newton matrix along the optimization path yields tighter stability bounds.

翻译：深度学习中的一个重要问题是高阶优化方法如何影响泛化性能。本文在回归任务背景下，分析了采用Levenberg-Marquardt阻尼和小批量采样的随机高斯-牛顿（SGN）方法训练具有光滑激活函数的过参数化深度神经网络。我们的理论贡献包含两个方面：首先，通过参数空间的变度量分析建立了有限时间收敛界，明确揭示了批量大小、网络宽度与深度对收敛速度的影响；其次，利用过参数化机制中的一致稳定性理论，推导出SGN方法的非渐近泛化界，系统刻画了曲率、批量大小和过参数化程度对泛化性能的作用机制。理论结果表明，当优化路径上高斯-牛顿矩阵的最小特征值较大时，SGN方法将进入更优的泛化区域，此时稳定性界限更为紧凑。

0

相关内容

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

46+阅读 · 11月21日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

专知会员服务

13+阅读 · 2022年3月27日

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

专知会员服务

20+阅读 · 2022年3月4日

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

从最大似然到EM算法：一致的理解方式

从最大似然到EM算法：一致的理解方式

PaperWeekly

19+阅读 · 2018年3月19日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

低维有限典型群与线传递2-(v,k,1)设计

国家自然科学基金

0+阅读 · 2015年12月31日

有限范围随机最优控制系统的数值方法与均场倒向随机系统的最优控制问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

基于对合否定的SBL公理化扩张系统的程度化推理及逻辑控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Machine-learning-enabled interpretation of tribological deformation patterns in large-scale MD data

Arxiv

0+阅读 · 12月5日

Random-Key Metaheuristic and Linearization for the Quadratic Multiple Constraints Variable-Sized Bin Packing Problem

Arxiv

0+阅读 · 11月15日

Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

Arxiv

0+阅读 · 11月12日

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Arxiv

0+阅读 · 11月10日

Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks

Arxiv

0+阅读 · 11月4日

VIP会员

文章信息

相关主题

参数化模型

相关VIP内容

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

144页ppt《扩散模型》，Google DeepMind Sander Dieleman

专知会员服务

46+阅读 · 11月21日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

【Erik J Bekkers博士论文】SE(2)中基于亚黎曼几何的视网膜图像分析，Retinal Image Analysis using Sub-Riemannian Geometry in SE(2)

专知会员服务

13+阅读 · 2022年3月27日

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

专知会员服务

20+阅读 · 2022年3月4日

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

锚定情报：合成欺骗时代的地面真相

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

相关资讯

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

从最大似然到EM算法：一致的理解方式

从最大似然到EM算法：一致的理解方式

PaperWeekly

19+阅读 · 2018年3月19日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

Machine-learning-enabled interpretation of tribological deformation patterns in large-scale MD data

Arxiv

0+阅读 · 12月5日

Random-Key Metaheuristic and Linearization for the Quadratic Multiple Constraints Variable-Sized Bin Packing Problem

Arxiv

0+阅读 · 11月15日

Non-Asymptotic Optimization and Generalization Bounds for Stochastic Gauss-Newton in Overparameterized Models

Arxiv

0+阅读 · 11月12日

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Arxiv

0+阅读 · 11月10日

Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks

Arxiv

0+阅读 · 11月4日

相关基金

低维有限典型群与线传递2-(v,k,1)设计

国家自然科学基金

0+阅读 · 2015年12月31日

有限范围随机最优控制系统的数值方法与均场倒向随机系统的最优控制问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

Jacobi行列式和Hilbert变换中的若干问题及应用

国家自然科学基金

0+阅读 · 2014年12月31日

随机系数和带跳的线性随机微分系统的H2/H∞控制

国家自然科学基金

0+阅读 · 2014年12月31日

基于对合否定的SBL公理化扩张系统的程度化推理及逻辑控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员