Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight tuning or complex search on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight tuning, termed ''lottery jackpots'', exist in pre-trained models with unexpanded width. For example, we obtain a lottery jackpot that has only 10% parameters and still reaches the performance of the original dense VGGNet-19 without any modifications on the pre-trained weights on CIFAR-10. Furthermore, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. Based on this insight, we initialize our sparse mask using the magnitude-based pruning, resulting in at least 3x cost reduction on the lottery jackpot search while achieving comparable or even better performance. Specifically, our magnitude-based lottery jackpot removes 90% weights in ResNet-50, while it easily obtains more than 70% top-1 accuracy using only 10 searching epochs on ImageNet. Our code is available at https://github.com/zyxxmu/lottery-jackpots.
翻译:网络运行是降低网络复杂性的有效方法,可以接受的绩效妥协。 现有的研究通过时间耗重的重量调整或对宽度扩大的网络进行复杂搜索,实现了神经网络的广度,这极大地限制了网络运行的应用。 在本文中,我们显示,在没有重力调参与的情况下,高性能和稀散的子网络,称为“ 彩票 ”,存在于未经开发的预培训模型中,其宽度不广。 例如,我们获得的彩票彩票中奖牌只有10%的参数,仍然能够达到原始密集的VGGNet-19的性能,而没有对CIFAR- 10 的预先训练重量作任何修改。 此外,我们观察到,从许多现有彩票标准产生的稀薄面罩与我们彩票的搜索面罩高度重叠,其中,与我们最相似的面罩“ 彩票 ”, 称为“ 彩票 彩票 ” 。 基于这种洞察,我们开始使用基于规模的彩票头罩, 导致彩票搜索至少减少3x成本,同时取得可比的或更好的性能。 具体而言, 我们的彩票中限为70/ 的彩票 。