Deep neural networks (DNNs) are nowadays ubiquitous in many domains such as computer vision. However, due to their high latency, the deployment of DNNs hinges on the development of compression techniques such as quantization which consists in lowering the number of bits used to encode the weights and activations. Growing concerns for privacy and security have motivated the development of data-free techniques, at the expanse of accuracy. In this paper, we identity the uniformity of the quantization operator as a limitation of existing approaches, and propose a data-free non-uniform method. More specifically, we argue that to be readily usable without dedicated hardware and implementation, non-uniform quantization shall not change the nature of the mathematical operations performed by the DNN. This leads to search among the continuous automorphisms of $(\mathbb{R}_+^*,\times)$, which boils down to the power functions defined by their exponent. To find this parameter, we propose to optimize the reconstruction error of each layer: in particular, we show that this procedure is locally convex and admits a unique solution. At inference time, we show that our approach, dubbed PowerQuant, only require simple modifications in the quantized DNN activation functions. As such, with only negligible overhead, it significantly outperforms existing methods in a variety of configurations.
翻译:深心神经网络(DNNS)如今在计算机视觉等许多领域普遍存在。然而,由于DNNS的部署具有高度的延迟性,因此其部署取决于压缩技术的发展,如量化技术,包括降低用于编码重量和激活的比特数。对隐私和安全的日益关注促使在精确度的广阔范围内开发无数据技术。在本文中,我们将四分化操作器的统一性确定为现有方法的局限性,并提出一种无数据的非统一方法。更具体地说,我们说,DNNS的部署取决于压缩技术的开发,例如量化技术的开发,这种压缩技术包括降低用于编码重量和激活的比特数。这导致对$(mathbb{R ⁇,\\\\\\\\ times)的持续自变形技术的探索,而这种技术的精细度将归结到由它们推论定义的功率功能。我们建议优化每个层的重建错误:特别是,我们说,这个程序是本地的软硬件和不易使用的,非单式的量化的量化的量化的转换不会改变数据操作。这需要一种特殊的QQAtrodustalstalstmexmexmex 。我们需要一种独特的方法。