深度计算优势：利用梯度下降学习高维层次函数 (The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent) - 专知论文

会员服务 ·

0

梯度 · 高维 · 深度计算 · 目标函数 · 通用动力公司 ·

The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent

翻译：深度计算优势：利用梯度下降学习高维层次函数

Yatin Dandi,Luca Pesce,Lenka Zdeborová,Florent Krzakala

Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian hierarchical targets) that incorporate a hierarchy of latent subspace dimensionalities. This framework enables us to analytically study the learning dynamics and generalization performance of deep networks compared to shallow ones in the high-dimensional limit. Specifically, our main theorem shows that feature learning with GD successively reduces the effective dimensionality, transforming a high-dimensional problem into a sequence of lower-dimensional ones. This enables learning the target function with drastically less samples than with shallow networks. While the results are proven in a controlled training setting, we also discuss more common training procedures and argue that they learn through the same mechanisms.

翻译：理解由梯度下降（GD）训练的深度神经网络相较于浅层模型的优势，仍是一个开放的理论挑战。本文引入一类目标函数（单索引与多索引高斯层次目标），其融合了潜在子空间维度的层次结构。该框架使我们能够在高维极限下，解析地研究深度网络相较于浅层网络的学习动态与泛化性能。具体而言，我们的主要定理表明，通过梯度下降的特征学习会逐步降低有效维度，将高维问题转化为一系列低维问题。这使得学习目标函数所需的样本量远少于浅层网络。虽然结果是在受控训练环境中证明的，我们也讨论了更常见的训练过程，并论证它们通过相同的机制进行学习。

0

相关内容

梯度的本意是一个向量（矢量），表示某一函数在该点处的方向导数沿着该方向取得最大值，即函数在该点处沿着该方向（此梯度的方向）变化最快，变化率最大（为该梯度的模）。

【NeurIPS2024】作为零样本无损梯度压缩器的语言模型：走向通用神经参数先验模型

【NeurIPS2024】作为零样本无损梯度压缩器的语言模型：走向通用神经参数先验模型

专知会员服务

14+阅读 · 2024年9月28日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

22+阅读 · 2023年5月10日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

在TensorFlow中对比两大生成模型：VAE与GAN

在TensorFlow中对比两大生成模型：VAE与GAN

机器之心

12+阅读 · 2017年10月23日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

基于散射点密度信息熵的层析SAR建筑三维重建新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

粗糙回归模型与算法研究

国家自然科学基金

8+阅读 · 2015年12月31日

基于稀疏性与分片常数空间的网格简化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication

Arxiv

0+阅读 · 12月15日

TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking

Arxiv

0+阅读 · 12月1日

Sampling-Based Optimization with Parallelized Physics Simulator for Bimanual Manipulation

Arxiv

0+阅读 · 11月26日

Reconstructing Sets of Strings from Their k-way Projections: Algorithms & Complexity

Arxiv

0+阅读 · 11月21日

Higher-Order Causal Structure Learning with Additive Models

Arxiv

0+阅读 · 11月5日

VIP会员

文章信息

相关主题

通用动力公司

相关VIP内容

【NeurIPS2024】作为零样本无损梯度压缩器的语言模型：走向通用神经参数先验模型

【NeurIPS2024】作为零样本无损梯度压缩器的语言模型：走向通用神经参数先验模型

专知会员服务

14+阅读 · 2024年9月28日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

22+阅读 · 2023年5月10日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

【深度图相似学习综述】Deep Graph Similarity Learning: A Survey，29页pdf，117条参考文献

专知会员服务

98+阅读 · 2019年12月31日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

斯坦福Jure Leskovec图表示学习：无监督和有监督方法（附PPT下载）

专知

24+阅读 · 2017年12月17日

在TensorFlow中对比两大生成模型：VAE与GAN

在TensorFlow中对比两大生成模型：VAE与GAN

机器之心

12+阅读 · 2017年10月23日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

相关论文

DP-CSGP: Differentially Private Stochastic Gradient Push with Compressed Communication

Arxiv

0+阅读 · 12月15日

TagSplat: Topology-Aware Gaussian Splatting for Dynamic Mesh Modeling and Tracking

Arxiv

0+阅读 · 12月1日

Sampling-Based Optimization with Parallelized Physics Simulator for Bimanual Manipulation

Arxiv

0+阅读 · 11月26日

Reconstructing Sets of Strings from Their k-way Projections: Algorithms & Complexity

Arxiv

0+阅读 · 11月21日

Higher-Order Causal Structure Learning with Additive Models

Arxiv

0+阅读 · 11月5日

相关基金

基于散射点密度信息熵的层析SAR建筑三维重建新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

粗糙回归模型与算法研究

国家自然科学基金

8+阅读 · 2015年12月31日

基于稀疏性与分片常数空间的网格简化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员