Modern deep neural networks rely heavily on massive model weights and training samples, incurring substantial computational costs. Weight pruning and coreset selection are two emerging paradigms proposed to improve computational efficiency. In this paper, we first explore the interplay between redundant weights and training samples through a transparent analysis: redundant samples, particularly noisy ones, cause model weights to become unnecessarily overtuned to fit them, complicating the identification of irrelevant weights during pruning; conversely, irrelevant weights tend to overfit noisy data, undermining coreset selection effectiveness. To further investigate and harness this interplay in deep learning, we develop a Simultaneous Weight and Sample Tailoring mechanism (SWaST) that alternately performs weight pruning and coreset selection to establish a synergistic effect in training. During this investigation, we observe that when simultaneously removing a large number of weights and samples, a phenomenon we term critical double-loss can occur, where important weights and their supportive samples are mistakenly eliminated at the same time, leading to model instability and nearly irreversible degradation that cannot be recovered in subsequent training. Unlike classic machine learning models, this issue can arise in deep learning due to the lack of theoretical guarantees on the correctness of weight pruning and coreset selection, which explains why these paradigms are often developed independently. We mitigate this by integrating a state preservation mechanism into SWaST, enabling stable joint optimization. Extensive experiments reveal a strong synergy between pruning and coreset selection across varying prune rates and coreset sizes, delivering accuracy boosts of up to 17.83% alongside 10% to 90% FLOPs reductions.
翻译:现代深度神经网络严重依赖海量模型权重和训练样本,导致巨大的计算开销。权重剪枝与核心集选择是两种新兴的、旨在提升计算效率的范式。本文首先通过透明化分析探究冗余权重与训练样本之间的相互作用:冗余样本(尤其是噪声样本)会导致模型权重为拟合这些样本而产生不必要的过度调优,从而增加剪枝过程中识别无关权重的难度;反之,无关权重容易对噪声数据产生过拟合,削弱核心集选择的有效性。为深入探究并利用深度学习中的这种相互作用,我们提出了一种同步权重与样本裁剪机制(SWaST),通过交替执行权重剪枝与核心集选择,在训练过程中建立协同效应。在研究过程中,我们观察到当同时移除大量权重与样本时,可能出现一种称为临界双损的现象,即重要权重及其支撑样本被错误地同时剔除,导致模型不稳定及近乎不可逆的性能衰退,且无法在后续训练中恢复。与经典机器学习模型不同,此问题在深度学习中可能出现,原因在于权重剪枝与核心集选择的正确性缺乏理论保证,这也解释了为何这些范式通常被独立开发。我们通过在SWaST中集成状态保持机制来缓解该问题,实现稳定的联合优化。大量实验表明,在不同剪枝率与核心集规模下,剪枝与核心集选择之间存在显著协同效应,在实现10%至90%浮点运算量减少的同时,最高可获得17.83%的精度提升。