深神经网络模型压缩和加速情况调查 (A Survey of Model Compression and Acceleration for Deep Neural Networks)

Deep convolutional neural networks (CNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and sharing will be described at the beginning, after that the other techniques will be introduced. For each scheme, we provide insightful analysis regarding the performance, related applications, advantages, and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrix, the main datasets used for evaluating the model performance and recent benchmarking efforts. Finally, we conclude this paper, discuss remaining challenges and possible directions on this topic.

翻译：深相神经网络(CNNs)最近在许多视觉识别任务中取得了巨大成功,然而,现有的深神经网络模型在计算上成本昂贵,记忆密集,阻碍了在低记忆资源设备或有严格潜伏要求的应用中部署这些模型,因此,自然考虑是在深网络中执行模型压缩和加速,而不会显著降低模型性能。在过去几年中,这一领域取得了巨大进展。在本文件中,我们调查了最近开发的压缩和加速CNN模型的先进技术。这些技术大致分为四个方案:参数调整和共享、低等级系数化、转移/组合的脉冲过滤器和知识蒸馏。参数调整和共享方法将在开始时加以说明,在其他技术推出后,将对其他技术加以说明。我们每个方案都对性能、相关应用、优势和缺点等进行了深刻的分析。然后,我们将通过几个最近的成功方法,例如动态能力网络和深度网络。这些技术将大致分为四个方案:参数调整和共享、低等级要素化、转移/组合式递增过滤器过滤器和知识蒸馏器。之后,参数调整和共享方法将在开始时说明方法,在采用其他方法,然后将说明其他方法,然后将介绍其他方法,然后讨论用于评估这一模型和可能的进度。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

专知会员服务

130+阅读 · 2019年10月12日