Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.
翻译:深度神经网络的精密神经网络的参数已经引起了浓厚的兴趣, 因为在培训期间和测试时, 时间、 记忆和能量都有可能节省。 最近的工程已经通过耗资不菲的培训和修剪周期, 发现在初始化时有中奖彩票或分散的可训练子网络。 这提出了一个基本问题 : 我们能否在初始化时, 在没有受过任何培训, 或者实际上从未看过数据的情况下, 发现高度稀少的可训练的亚网络? 我们通过理论驱动算法设计, 对这一问题提供了肯定的答案 。 我们首先在数学上制定并实验性地核查了保护法, 解释为什么基于梯度的剪裁法在初始化时会因层的折叠而受害, 过早地剪裁整个层使网络变得不可操纵。 这个理论还说明了如何完全避免层折叠, 激励一种新型的剪裁算算算法 。 这种算法可以被解释为保存在初始化时使用的网络中的所有合成力量流 。 显然, 这个算法没有参考了基于 缩缩缩 。 。