精度、量化和精度:神经网络的有效压缩 (Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks)

Compressing large neural networks is an important step for their deployment in resource-constrained computational platforms. In this context, vector quantization is an appealing framework that expresses multiple parameters using a single code, and has recently achieved state-of-the-art network compression on a range of core vision and natural language processing tasks. Key to the success of vector quantization is deciding which parameter groups should be compressed together. Previous work has relied on heuristics that group the spatial dimension of individual convolutional filters, but a general solution remains unaddressed. This is desirable for pointwise convolutions (which dominate modern architectures), linear layers (which have no notion of spatial dimension), and convolutions (when more than one filter is compressed to the same codeword). In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function. We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress. Finally, we rely on an annealed quantization algorithm to better compress the network and achieve higher final accuracy. We show results on image classification, object detection, and segmentation, reducing the gap with the uncompressed model by 40 to 70% with respect to the current state of the art.

翻译：压缩大型神经网络是它们部署在资源限制的计算平台中的重要一步。在这方面, 矢量量化是一个充满吸引力的框架, 它用一个代码表达多个参数, 最近在一系列核心视觉和自然语言处理任务上实现了最先进的网络压缩。矢量量化的成功关键在于决定哪些参数组应该一起压缩。先前的工作依赖于将单个脉冲过滤器的空间维度分组的超常理论, 但一般解决方案仍未得到解决。这对于点进化( 主导现代结构)、线性层( 没有空间维度概念 ) 和 Convolution( 当超过一个过滤器压缩到同一个代码字时) 来说是可取的。在本文件中, 我们观察到, 两个相邻层的重量可以在表达相同功能的同时被移动。我们随后建立连接到率扭曲理论和搜索导致网络更易压缩的变异性。最后, 我们依靠点化算算法来更好地调整网络的网络, 线性层层层( 没有空间维度概念的概念) 和 convolution( 当一个过滤器被压缩到同一编码时) 。我们用到图像的分类, 将40 显示结果。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【清华大学】图随机神经网络，Graph Random Neural Networks

专知会员服务

152+阅读 · 2020年5月26日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

41+阅读 · 2020年4月22日

【普渡大学】提升GNN表达能力的集体学习框架，Boost GNN Expressiveness

专知会员服务

45+阅读 · 2020年3月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日