Hessian:了解神经网络中Hessian人的共同结构 (Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks) - 专知论文

会员服务 ·

0

可理解性 · Neural Networks · Networking · 秩 · Better ·

2020 年 11 月 30 日

Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks

翻译：Hessian:了解神经网络中Hessian人的共同结构

Yikai Wu,Xingyu Zhu,Chenwei Wu,Annie Wang,Rong Ge

from arxiv, 31 pages, 27 figures. Main text: 9 pages, 7 figures. First two authors have equal contribution and are in alphabetical order

Hessian captures important properties of the deep neural network loss landscape. We observe that eigenvectors and eigenspaces of the layer-wise Hessian for neural network objective have several interesting structures -- top eigenspaces for different models have high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the corresponding weight matrix. These structures, as well as the low rank structure of the Hessian observed in previous studies, can be explained by approximating the Hessian using Kronecker factorization. Our new understanding can also explain why some of these structures become weaker when the network is trained with batch normalization. Finally, we show that the Kronecker factorization can be combined with PAC-Bayes techniques to get better explicit generalization bounds.

翻译：Hesian 捕捉了深层神经网络损失地貌的重要特性。我们观察到, 层- 神经网络目标Hessian 层- 层- 神经网络目标的源生体和源空间有几种有趣的结构 -- -- 不同模型的顶层神经空间重叠程度较高,顶层脑源在重塑成与相应重量矩阵相同的形状时形成低级矩阵。这些结构, 以及以前研究中观察到的赫西安人的低级结构, 可以用使用Kronecker 系数化来接近赫西安人来解释。我们的新理解还可以解释为什么在对网络进行批次正常化培训时, 其中一些结构会变得较弱。最后, 我们表明, Kronecker 系数化可以与PAC- Bayes 技术相结合, 以获得更清晰的概括化界限。

0

相关内容

可理解性

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

斯坦福2020硬课《分布式算法与优化》

斯坦福2020硬课《分布式算法与优化》

专知会员服务

123+阅读 · 2020年5月6日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

专知会员服务

34+阅读 · 2020年3月4日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【论文Slides分享】多媒体顶级会议ACM Multimedia 2017 China 论文宣讲会报告Slides资料分享

【论文Slides分享】多媒体顶级会议ACM Multimedia 2017 China 论文宣讲会报告Slides资料分享

专知

7+阅读 · 2017年10月18日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Distributed Training and Optimization Of Neural Networks

Distributed Training and Optimization Of Neural Networks

Arxiv

0+阅读 · 2021年1月15日

Random and quasi-random designs in group testing

Random and quasi-random designs in group testing

Arxiv

0+阅读 · 2021年1月15日

Statistical analysis of Mapper for stochastic and multivariate filters

Arxiv

0+阅读 · 2021年1月15日

Any-Precision Deep Neural Networks

Arxiv

0+阅读 · 2021年1月15日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Arxiv

3+阅读 · 2018年7月19日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Understanding disentangling in $β$-VAE

Arxiv

4+阅读 · 2018年4月10日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

斯坦福2020硬课《分布式算法与优化》

斯坦福2020硬课《分布式算法与优化》

专知会员服务

123+阅读 · 2020年5月6日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

【斯坦福大学】Dropout的隐性和显性正则化效应，Regularization Effects

专知会员服务

34+阅读 · 2020年3月4日

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

【北京智源大会2019】神经网络的优化Optimization for Overparametrized Deep Neural Networks，北京大学 | 王立威

专知会员服务

23+阅读 · 2019年11月21日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【论文Slides分享】多媒体顶级会议ACM Multimedia 2017 China 论文宣讲会报告Slides资料分享

【论文Slides分享】多媒体顶级会议ACM Multimedia 2017 China 论文宣讲会报告Slides资料分享

专知

7+阅读 · 2017年10月18日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Distributed Training and Optimization Of Neural Networks

Distributed Training and Optimization Of Neural Networks

Arxiv

0+阅读 · 2021年1月15日

Random and quasi-random designs in group testing

Random and quasi-random designs in group testing

Arxiv

0+阅读 · 2021年1月15日

Statistical analysis of Mapper for stochastic and multivariate filters

Arxiv

0+阅读 · 2021年1月15日

Any-Precision Deep Neural Networks

Arxiv

0+阅读 · 2021年1月15日

Pretrained Transformers Improve Out-of-Distribution Robustness

Arxiv

5+阅读 · 2020年4月13日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Towards Understanding Regularization in Batch Normalization

Towards Understanding Regularization in Batch Normalization

Arxiv

4+阅读 · 2018年9月27日

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer

Arxiv

3+阅读 · 2018年7月19日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

Understanding disentangling in $β$-VAE

Arxiv

4+阅读 · 2018年4月10日

微信扫码咨询专知VIP会员