BitT: 强劲的催化多蒸馏变异器 (BiT: Robustly Binarized Multi-distilled Transformer) - 专知论文

会员服务 ·

0

变换 · 查准率/准确率 · binary · 模型评估 · MoDELS ·

2022 年 10 月 2 日

BiT: Robustly Binarized Multi-distilled Transformer

翻译：BitT: 强劲的催化多蒸馏变异器

Zechun Liu,Barlas Oguz,Aasish Pappu,Lin Xiao,Scott Yih,Meng Li,Raghuraman Krishnamoorthi,Yashar Mehdad

from arxiv, NeurIPS 2022

Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however, is technically challenging from an optimization perspective. In this work, we identify a series of improvements that enables binary transformers at a much higher accuracy than what was possible previously. These include a two-set binarization scheme, a novel elastic binary activation function with learned parameters, and a method to quantize a network to its limit by successively distilling higher precision models into lower precision students. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline on the GLUE language understanding benchmark within as little as 5.9%. Code and models are available at: https://github.com/facebookresearch/bit.

翻译：现代培训前变压器在机器学习中迅速提升了最先进的变压器,但也提高了参数和计算复杂性,使其越来越难以在资源受限制的环境中部署。然而,从优化的角度看,对网络的重量和激活进行计数可以从技术上大大缓解这些问题具有挑战性。在这项工作中,我们确定了一系列改进措施,使二进制变压器的精度大大高于以前可能达到的水平。其中包括两套二进制计划、具有新颖的弹性二进制激活功能,以及通过连续将更精密的模型蒸馏到低精度学生中,对网络加以限制的方法进行量化。这些办法首次允许将完全二进化的变压器模型在实际精确水平上,接近GLUE语言理解基准的完全精度BERT基线,但不超过5.9%。代码和模型可在以下网址查阅:https://github.com/facebookreasearch/bit。

0

相关内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

一株自发性出血性脑卒中大鼠的培育

国家自然科学基金

0+阅读 · 2014年12月31日

腺病毒介导精氨酸脱亚氨基酶靶向性基因治疗肝癌的机制

国家自然科学基金

1+阅读 · 2012年12月31日

BRCA1蛋白出核的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

强力学仿生细胞外基质纳米纤维支架介导DCN shRNA长效转染ASCs的肌腱缺损修复研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于牛XY精子差异表达基因的性控DNA疫苗研究

国家自然科学基金

0+阅读 · 2012年12月31日

MG53调节心脏辅助亚基KChIP2表达的分子机制及其在心脏电稳态调节中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

RNAi沉默NgR基因促进神经干细胞修复脑梗死的MRI分子影像学研究

国家自然科学基金

0+阅读 · 2012年12月31日

人胚胎干细胞来源的Ⅱ型肺泡上皮细胞的免疫原性

国家自然科学基金

0+阅读 · 2011年12月31日

多层结构Ga2O3深紫外透明导电膜研究

国家自然科学基金

0+阅读 · 2009年12月31日

eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Arxiv

0+阅读 · 2022年11月6日

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Arxiv

0+阅读 · 2022年11月4日

Adversarial Defense via Neural Oscillation inspired Gradient Masking

Arxiv

0+阅读 · 2022年11月4日

SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates

Arxiv

0+阅读 · 2022年11月4日

Exploring Target Representations for Masked Autoencoders

Arxiv

0+阅读 · 2022年11月3日

Rethinking Hierarchicies in Pre-trained Plain Vision Transformer

Arxiv

0+阅读 · 2022年11月3日

IQ-Learn: Inverse soft-Q Learning for Imitation

Arxiv

0+阅读 · 2022年11月3日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的检索与结构化增强生成综述

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

【CMU博士论文】交互驱动的人体动作估计与生成

如何避免生成式人工智能在作战中失控失效

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Arxiv

0+阅读 · 2022年11月6日

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Arxiv

0+阅读 · 2022年11月4日

Adversarial Defense via Neural Oscillation inspired Gradient Masking

Arxiv

0+阅读 · 2022年11月4日

SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates

Arxiv

0+阅读 · 2022年11月4日

Exploring Target Representations for Masked Autoencoders

Arxiv

0+阅读 · 2022年11月3日

Rethinking Hierarchicies in Pre-trained Plain Vision Transformer

Arxiv

0+阅读 · 2022年11月3日

IQ-Learn: Inverse soft-Q Learning for Imitation

Arxiv

0+阅读 · 2022年11月3日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

相关基金

一株自发性出血性脑卒中大鼠的培育

国家自然科学基金

0+阅读 · 2014年12月31日

腺病毒介导精氨酸脱亚氨基酶靶向性基因治疗肝癌的机制

国家自然科学基金

1+阅读 · 2012年12月31日

BRCA1蛋白出核的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

强力学仿生细胞外基质纳米纤维支架介导DCN shRNA长效转染ASCs的肌腱缺损修复研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于牛XY精子差异表达基因的性控DNA疫苗研究

国家自然科学基金

0+阅读 · 2012年12月31日

MG53调节心脏辅助亚基KChIP2表达的分子机制及其在心脏电稳态调节中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

RNAi沉默NgR基因促进神经干细胞修复脑梗死的MRI分子影像学研究

国家自然科学基金

0+阅读 · 2012年12月31日

人胚胎干细胞来源的Ⅱ型肺泡上皮细胞的免疫原性

国家自然科学基金

0+阅读 · 2011年12月31日

多层结构Ga2O3深紫外透明导电膜研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员