提高培训后培训后对培训前语文模式的量化效率 (Towards Efficient Post-training Quantization of Pre-trained Language Models) - 专知论文

会员服务 ·

0

语言模型化 · MoDELS · 重构误差 · Performer · 模型并行 ·

2021 年 9 月 30 日

Towards Efficient Post-training Quantization of Pre-trained Language Models

翻译：提高培训后培训后对培训前语文模式的量化效率

Haoli Bai,Lu Hou,Lifeng Shang,Xin Jiang,Irwin King,Michael R. Lyu

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models~(PLMs). However, most existing quantization methods for PLMs follow quantization-aware training~(QAT) that requires end-to-end training with full access to the entire dataset. Therefore, they suffer from slow training, large memory overhead, and data security issues. In this paper, we study post-training quantization~(PTQ) of PLMs, and propose module-wise quantization error minimization~(MREM), an efficient solution to mitigate these issues. By partitioning the PLM into multiple modules, we minimize the reconstruction error incurred by quantization for each module. In addition, we design a new model parallel training strategy such that each module can be trained locally on separate computing devices without waiting for preceding modules, which brings nearly the theoretical training speed-up (e.g., $4\times$ on $4$ GPUs). Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.

翻译：随着经过培训的大型语言模型的迅速增长,网络的量化得到了越来越多的关注。然而,目前对PLMS的现有量化方法大多采用量化认知培训(QAT)方法,这需要全能访问整个数据集的端对端培训(QAT),因此,它们受到缓慢的培训、记忆管理以及数据安全问题的困扰。在本文中,我们研究了对PLMS的培训后量化(PTQ)方法,并提出将模块性量化错误最小化(MREM),这是缓解这些问题的一个有效解决方案。通过将PLM分成多个模块,我们最大限度地减少每个模块的重组错误。此外,我们设计了一个新的模型平行培训战略,使每个模块可以在不等待前一个模块的情况下就单独的计算设备在当地接受培训,这几乎可以带来理论培训速度(例如,4美元对4美元GPUPUS)的提升。关于GLUE和SQOAD基准的实验表明,我们提议的PTQQ的解决方案不仅接近QAT,而且在培训时间、记忆、数据以及消费方面大幅减少。

0

相关内容

语言模型化

语言模型化

【CVPR2021】自监督几何感知

【CVPR2021】自监督几何感知

专知会员服务

45+阅读 · 2021年3月6日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

55+阅读 · 2021年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

276+阅读 · 2020年11月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

46+阅读 · 2020年7月4日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

54+阅读 · 2020年5月26日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

48+阅读 · 2020年5月3日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

31+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

凸优化及无约束最优化

凸优化及无约束最优化

AINLP

3+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

资源｜斯坦福课程：深度学习理论！

资源｜斯坦福课程：深度学习理论！

全球人工智能

16+阅读 · 2017年11月9日

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

机器学习研究会

9+阅读 · 2017年11月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization Performance

Arxiv

0+阅读 · 2021年11月23日

On Training Implicit Models

Arxiv

0+阅读 · 2021年11月22日

Lottery Jackpots Exist in Pre-trained Models

Arxiv

0+阅读 · 2021年11月22日

Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

Arxiv

6+阅读 · 2021年10月12日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

21+阅读 · 2021年6月21日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

11+阅读 · 2020年6月23日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【CVPR2021】自监督几何感知

【CVPR2021】自监督几何感知

专知会员服务

45+阅读 · 2021年3月6日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

55+阅读 · 2021年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

276+阅读 · 2020年11月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

46+阅读 · 2020年7月4日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

54+阅读 · 2020年5月26日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

48+阅读 · 2020年5月3日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

31+阅读 · 2020年2月21日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

热门VIP内容

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

凸优化及无约束最优化

凸优化及无约束最优化

AINLP

3+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

资源｜斯坦福课程：深度学习理论！

资源｜斯坦福课程：深度学习理论！

全球人工智能

16+阅读 · 2017年11月9日

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

【推荐】斯坦福课程：深度学习理论（附视频+讲义+阅读材料）

机器学习研究会

9+阅读 · 2017年11月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization Performance

Arxiv

0+阅读 · 2021年11月23日

On Training Implicit Models

Arxiv

0+阅读 · 2021年11月22日

Lottery Jackpots Exist in Pre-trained Models

Arxiv

0+阅读 · 2021年11月22日

Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval

Arxiv

6+阅读 · 2021年10月12日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

21+阅读 · 2021年6月21日

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

Arxiv

15+阅读 · 2020年7月1日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

11+阅读 · 2020年6月23日

Universal Transformers

Universal Transformers

Arxiv

5+阅读 · 2019年3月5日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

微信扫码咨询专知VIP会员