Recent advances in few-step diffusion models have demonstrated their efficiency and effectiveness by shortcutting the probabilistic paths of diffusion models, especially in training one-step diffusion models from scratch (\emph{a.k.a.} shortcut models). However, their theoretical derivation and practical implementation are often closely coupled, which obscures the design space. To address this, we propose a common design framework for representative shortcut models. This framework provides theoretical justification for their validity and disentangles concrete component-level choices, thereby enabling systematic identification of improvements. With our proposed improvements, the resulting one-step model achieves a new state-of-the-art FID50k of 2.85 on ImageNet-256x256 under the classifier-free guidance setting with one step generation, and further reaches FID50k of 2.53 with 2x training steps. Remarkably, the model requires no pre-training, distillation, or curriculum learning. We believe our work lowers the barrier to component-level innovation in shortcut models and facilitates principled exploration of their design space.
翻译:近期,少步扩散模型的进展通过捷径化扩散模型的概率路径,证明了其效率与有效性,尤其是在从头训练一步扩散模型(亦称捷径模型)方面。然而,其理论推导与实际实现往往紧密耦合,这模糊了设计空间。为解决此问题,我们为代表性捷径模型提出了一个通用设计框架。该框架为其有效性提供了理论依据,并解耦了具体的组件级选择,从而能够系统性地识别改进方向。通过我们提出的改进,所得的一步模型在无分类器引导设置下,以单步生成在ImageNet-256x256上实现了2.85的FID50k新最优结果,并在训练步数加倍后进一步达到2.53的FID50k。值得注意的是,该模型无需预训练、蒸馏或课程学习。我们相信,我们的工作降低了捷径模型中组件级创新的门槛,并促进了其设计空间的系统性探索。