Diffusion models achieve state-of-the-art image generation but remain computationally costly due to iterative denoising. Latent-space models like Stable Diffusion reduce overhead yet lose fine detail, while retrieval-augmented methods improve efficiency but rely on large memory banks, static similarity models, and rigid infrastructures. We introduce the Prototype Diffusion Model (PDM), which embeds prototype learning into the diffusion process to provide adaptive, memory-free conditioning. Instead of retrieving references, PDM learns compact visual prototypes from clean features via contrastive learning, then aligns noisy representations with semantically relevant patterns during denoising. Experiments demonstrate that PDM sustains high generation quality while lowering computational and storage costs, offering a scalable alternative to retrieval-based conditioning.
翻译:扩散模型在图像生成领域取得了最先进的性能,但由于迭代去噪过程,其计算成本仍然高昂。潜在空间模型(如Stable Diffusion)降低了计算开销,却损失了细节信息;而检索增强方法虽提升了效率,但依赖大型记忆库、静态相似度模型与刚性基础设施。本文提出原型扩散模型(PDM),将原型学习嵌入扩散过程,实现自适应、无记忆的条件生成。PDM无需检索参考样本,而是通过对比学习从清晰特征中学习紧凑的视觉原型,进而在去噪过程中将含噪表征与语义相关模式对齐。实验表明,PDM在保持高质量生成效果的同时,显著降低了计算与存储成本,为基于检索的条件生成提供了可扩展的替代方案。