Despite the success of deep neural networks (DNNs), state-of-the-art models are too large to deploy on low-resource devices or common server configurations in which multiple models are held in memory. Model compression methods address this limitation by reducing the memory footprint, latency, or energy consumption of a model with minimal impact on accuracy. We focus on the task of reducing the number of learnable variables in the model. In this work we combine ideas from weight hashing and dimensionality reductions resulting in a simple and powerful structured multi-hashing method based on matrix products that allows direct control of model size of any deep network and is trained end-to-end. We demonstrate the strength of our approach by compressing models from the ResNet, EfficientNet, and MobileNet architecture families. Our method allows us to drastically decrease the number of variables while maintaining high accuracy. For instance, by applying our approach to EfficentNet-B4 (16M parameters) we reduce it to to the size of B0 (5M parameters), while gaining over 3% in accuracy over B0 baseline. On the commonly used benchmark CIFAR10 we reduce the ResNet32 model by 75% with no loss in quality, and are able to do a 10x compression while still achieving above 90% accuracy.
翻译:尽管深层神经网络(DNNs)取得了成功,但最先进的模型却过于庞大,无法在低资源设备或共同服务器配置上部署,在其中存储多个模型。模型压缩方法通过减少记忆足迹、延缓或能量消耗来应对这一限制,对精确度影响最小的模型。我们侧重于减少模型中可学习变量数量的任务。在这项工作中,我们把重力散射和维度减低的理念结合起来,导致基于任何深网络模型大小直接控制并经过培训的终端到终端的矩阵产品的简单和强大的结构化多功能化多功能化方法。我们通过压缩ResNet、高效网络和移动网络结构组合的模型来显示我们的方法的力度。我们的方法使我们能够大幅减少变量数量,同时保持高精确度。例如,我们采用EfficentNet-B4(16M参数)的方法,将其减少到B0(5M参数)的大小,同时使B0基线的精确度超过3%。我们通常使用的基准 CIFAR10 和移动网络结构的模型能够达到10的精确度,同时使Res32%的精确度达到10的精确度达到10。