估计和尽量扩大知识蒸馏相互信息 (Estimating and Maximizing Mutual Information for Knowledge Distillation) - 专知论文

会员服务 ·

0

Student Networks · INFORMS · 互信息 · 估计/估计量 · 蒸馏 ·

2021 年 11 月 29 日

Estimating and Maximizing Mutual Information for Knowledge Distillation

翻译：估计和尽量扩大知识蒸馏相互信息

Aman Shrivastava,Yanjun Qi,Vicente Ordonez

In this work, we propose Mutual Information Maximization Knowledge Distillation (MIMKD). Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network. We demonstrate through extensive experiments that this can be used to improve the performance of low capacity models by transferring knowledge from more performant but computationally expensive models. This can be used to produce better models that can be run on devices with low computational resources. Our method is flexible, we can distill knowledge from teachers with arbitrary network architectures to arbitrary student networks. Our empirical results show that MIMKD outperforms competing approaches across a wide range of student-teacher pairs with different capacities, with different architectures, and when student networks are with extremely low capacity. We are able to obtain 74.55% accuracy on CIFAR100 with a ShufflenetV2 from a baseline accuracy of 69.8% by distilling knowledge from ResNet-50. On Imagenet we improve a ResNet-18 network from 68.88% to 70.32% accuracy (1.44%+) using a ResNet-34 teacher network.

翻译：在这项工作中,我们提出相互信息最大化知识蒸馏(MIMKD) 。我们的方法使用一个对比性的目标,即同时估计和最大限度地扩大教师和学生网络之间当地和全球特征表现的相互信息;我们通过广泛的实验表明,可以通过从更有性能但计算成本高昂的模式转让知识,来提高低能力模型的性能;这可用于产生更好的模型,可以在低计算资源设备上运行。我们的方法是灵活的,我们可以将具有任意网络结构的教师的知识提取到任意学生网络。我们的经验结果表明,MIMKD在能力不同的学生-教师对口中,在不同的结构中,在学生网络能力极低的情况下,超越了相互竞争的方法。我们能够利用ResNet-34的教师网络,从69.8%的基线精度中提取到Shufflenet-50的精度,在CFARFAR100上获得74.55%的精度。我们利用ResNet-34的教师网络,将ResNet-18网络从68.88%提高到70.32%(1.44 ⁇ )。

0

相关内容

Student Networks

Student Networks

ICLR 2022接受论文列表出炉！1095 篇论文都在这了！

ICLR 2022接受论文列表出炉！1095 篇论文都在这了！

专知会员服务

74+阅读 · 2022年1月30日

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

20+阅读 · 2021年8月17日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

44+阅读 · 2020年10月31日

IJCAI2020接受论文列表，592篇论文pdf都在这了！

IJCAI2020接受论文列表，592篇论文pdf都在这了！

专知会员服务

63+阅读 · 2020年7月16日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

94+阅读 · 2020年3月25日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

53+阅读 · 2020年3月5日

【论文|知识图谱】小样本知识图谱补全，Few-Shot Knowledge Graph Completion

【论文|知识图谱】小样本知识图谱补全，Few-Shot Knowledge Graph Completion

专知会员服务

117+阅读 · 2019年11月30日

【AAAI2020】知识图谱对齐网络（Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation），孙泽群，胡伟

【AAAI2020】知识图谱对齐网络（Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation），孙泽群，胡伟

专知会员服务

59+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

32+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

11+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Oracle Teacher: Towards Better Knowledge Distillation

Oracle Teacher: Towards Better Knowledge Distillation

Arxiv

0+阅读 · 2022年2月1日

Boosting of Head Pose Estimation by Knowledge Distillation

Arxiv

0+阅读 · 2022年1月28日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Boosting Contrastive Learning with Relation Knowledge Distillation

Arxiv

9+阅读 · 2021年12月8日

Instance-Conditional Knowledge Distillation for Object Detection

Arxiv

8+阅读 · 2021年10月25日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Contrastive Representation Distillation

Contrastive Representation Distillation

Arxiv

5+阅读 · 2019年10月23日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

VIP会员

文章信息

相关主题

Student Networks

估计/估计量

相关VIP内容

ICLR 2022接受论文列表出炉！1095 篇论文都在这了！

ICLR 2022接受论文列表出炉！1095 篇论文都在这了！

专知会员服务

74+阅读 · 2022年1月30日

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

20+阅读 · 2021年8月17日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

44+阅读 · 2020年10月31日

IJCAI2020接受论文列表，592篇论文pdf都在这了！

IJCAI2020接受论文列表，592篇论文pdf都在这了！

专知会员服务

63+阅读 · 2020年7月16日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

94+阅读 · 2020年3月25日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

53+阅读 · 2020年3月5日

【论文|知识图谱】小样本知识图谱补全，Few-Shot Knowledge Graph Completion

【论文|知识图谱】小样本知识图谱补全，Few-Shot Knowledge Graph Completion

专知会员服务

117+阅读 · 2019年11月30日

【AAAI2020】知识图谱对齐网络（Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation），孙泽群，胡伟

【AAAI2020】知识图谱对齐网络（Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation），孙泽群，胡伟

专知会员服务

59+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

32+阅读 · 2019年10月17日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

11+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

26+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Oracle Teacher: Towards Better Knowledge Distillation

Oracle Teacher: Towards Better Knowledge Distillation

Arxiv

0+阅读 · 2022年2月1日

Boosting of Head Pose Estimation by Knowledge Distillation

Arxiv

0+阅读 · 2022年1月28日

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Arxiv

11+阅读 · 2021年12月9日

Boosting Contrastive Learning with Relation Knowledge Distillation

Arxiv

9+阅读 · 2021年12月8日

Instance-Conditional Knowledge Distillation for Object Detection

Arxiv

8+阅读 · 2021年10月25日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Contrastive Representation Distillation

Contrastive Representation Distillation

Arxiv

5+阅读 · 2019年10月23日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

微信扫码咨询专知VIP会员