TENGRAD: 时间效率高的自然渐变后裔,有精确的渔业-锁反转 (TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion) - 专知论文

会员服务 ·

0

可约的 · INFORMS · state-of-the-art · Fisher信息矩阵 · 近似 ·

2021 年 6 月 7 日

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

翻译：TENGRAD: 时间效率高的自然渐变后裔,有精确的渔业-锁反转

Saeed Soori,Bugra Can,Baourun Mu,Mert Gürbüzbalaban,Maryam Mehri Dehnavi

This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix is large. Approximate NGD methods such as KFAC attempt to improve NGD's running time and practical application by reducing the Fisher matrix inversion cost with approximation. However, the approximations do not reduce the overall time significantly and lead to less accurate parameter updates and loss of curvature information. TENGraD improves the time efficiency of NGD by computing Fisher block inverses with a computationally efficient covariance factorization and reuse method. It computes the inverse of each block exactly using the Woodbury matrix identity to preserve curvature information while admitting (linear) fast convergence rates. Our experiments on image classification tasks for state-of-the-art deep neural architecture on CIFAR-10, CIFAR-100, and Fashion-MNIST show that TENGraD significantly outperforms state-of-the-art NGD methods and often stochastic gradient descent in wall-clock time.

翻译：这项工作提出了一种具有时间效率的自然梯子法,称为TENGRAD,具有线性趋同保证。计算神经网络的渔业信息矩阵的反面在NGD中成本很高,因为Fisher 矩阵很大。KFAC等近似NGD方法试图通过降低Fisher 矩阵反向成本来改善NGD的运行时间和实用应用,但近似方法并没有显著缩短整个时间,导致曲线更新参数和丢失。TENGAD通过以计算效率高的共变系数和再利用方法计算渔业区块反向数据提高NGD的时间效率。它计算每个区完全使用Woodbury矩阵特性来保存曲线信息,同时承认(线性)快速趋同率。我们在CIFAR-10、CIFAR-100和FASAshion-MNIST的高级深神经结构图像分类任务实验显示,ENGAD明显超越了最新NGD方法,而且常常在墙时段梯梯梯梯梯级梯级下降。

0

相关内容

可约的

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】概率导论，520页pdf

专知会员服务

128+阅读 · 2020年11月25日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

123+阅读 · 2020年5月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

239+阅读 · 2020年1月21日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【资源推荐】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》by Afshine Amidi, Shervine Amidi

【资源推荐】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》by Afshine Amidi, Shervine Amidi

专知会员服务

27+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Analysis of nonconforming IFE methods and a new scheme for elliptic interface problems

Analysis of nonconforming IFE methods and a new scheme for elliptic interface problems

Arxiv

0+阅读 · 2021年8月6日

Fast Algorithms and Error Analysis of Caputo Derivatives with Small Factional Orders

Arxiv

0+阅读 · 2021年8月6日

The Faddeev-LeVerrier algorithm and the Pfaffian

Arxiv

0+阅读 · 2021年8月5日

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

Arxiv

0+阅读 · 2021年8月5日

On spectral algorithms for community detection in stochastic blockmodel graphs with vertex covariates

Arxiv

0+阅读 · 2021年8月4日

Robust Differentiable SVD

Arxiv

9+阅读 · 2021年4月8日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Meta-Learning with Differentiable Convex Optimization

Arxiv

5+阅读 · 2019年4月23日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

VIP会员

文章信息

相关主题

state-of-the-art

Fisher信息矩阵

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】概率导论，520页pdf

专知会员服务

128+阅读 · 2020年11月25日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

58+阅读 · 2020年11月21日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

123+阅读 · 2020年5月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

康奈尔大学Jon Kleinberg经典书《算法设计Algorithm Design》课件PPT与电子书，864页pdf

专知会员服务

239+阅读 · 2020年1月21日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【资源推荐】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》by Afshine Amidi, Shervine Amidi

【资源推荐】深度学习技巧速查清单《CS 230 - Deep Learning Tips and Tricks Cheatsheet》by Afshine Amidi, Shervine Amidi

专知会员服务

27+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】利用强化学习与生成模型推进可靠且可泛化的决策

美海军研发“增强侦察与态势评估系统（ARES）”应用程序以优化作战规划（附研究论文）

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

面向深度研究系统的强化学习基础：综述

相关资讯

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Analysis of nonconforming IFE methods and a new scheme for elliptic interface problems

Analysis of nonconforming IFE methods and a new scheme for elliptic interface problems

Arxiv

0+阅读 · 2021年8月6日

Fast Algorithms and Error Analysis of Caputo Derivatives with Small Factional Orders

Arxiv

0+阅读 · 2021年8月6日

The Faddeev-LeVerrier algorithm and the Pfaffian

Arxiv

0+阅读 · 2021年8月5日

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

Arxiv

0+阅读 · 2021年8月5日

On spectral algorithms for community detection in stochastic blockmodel graphs with vertex covariates

Arxiv

0+阅读 · 2021年8月4日

Robust Differentiable SVD

Arxiv

9+阅读 · 2021年4月8日

On Layer Normalization in the Transformer Architecture

Arxiv

4+阅读 · 2020年2月12日

Meta-Learning with Differentiable Convex Optimization

Arxiv

5+阅读 · 2019年4月23日

Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

Arxiv

8+阅读 · 2018年11月21日

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Arxiv

3+阅读 · 2018年10月1日

微信扫码咨询专知VIP会员