Large-scale vision generative models, including diffusion and flow models, have demonstrated remarkable performance in visual generation tasks. However, transferring these pre-trained models to downstream tasks often results in significant parameter redundancy. In this paper, we propose EntPruner, an entropy-guided automatic progressive pruning framework for diffusion and flow models. First, we introduce entropy-guided pruning, a block-level importance assessment strategy specifically designed for generative models. Unlike discriminative models, generative models require preserving the diversity and condition-fidelity of the output distribution. As the importance of each module can vary significantly across downstream tasks, EntPruner prioritizes pruning of less important blocks using data-dependent Conditional Entropy Deviation (CED) as a guiding metric. CED quantifies how much the distribution diverges from the learned conditional data distribution after removing a block. Second, we propose a zero-shot adaptive pruning framework to automatically determine when and how much to prune during training. This dynamic strategy avoids the pitfalls of one-shot pruning, mitigating mode collapse, and preserving model performance. Extensive experiments on DiT and SiT models demonstrate the effectiveness of EntPruner, achieving up to 2.22$\times$ inference speedup while maintaining competitive generation quality on ImageNet and three downstream datasets.
翻译:大规模视觉生成模型,包括扩散模型与流模型,已在视觉生成任务中展现出卓越性能。然而,将这些预训练模型迁移至下游任务时,常出现显著的参数冗余。本文提出EntPruner,一种基于熵引导的自动渐进式剪枝框架,专为扩散与流模型设计。首先,我们引入熵引导剪枝策略,这是一种针对生成模型设计的块级重要性评估方法。与判别模型不同,生成模型需保持输出分布的多样性与条件保真度。由于各模块的重要性在不同下游任务间差异显著,EntPruner采用数据依赖的条件熵偏差作为指导指标,优先剪枝重要性较低的模块。CED量化了移除某个模块后,分布与已学习的条件数据分布之间的偏离程度。其次,我们提出零样本自适应剪枝框架,可在训练过程中自动决定剪枝时机与剪枝量。这种动态策略避免了一次性剪枝的缺陷,缓解了模式坍塌问题,并保持了模型性能。在DiT与SiT模型上的大量实验验证了EntPruner的有效性,在ImageNet及三个下游数据集上实现了最高2.22倍的推理加速,同时保持了具有竞争力的生成质量。