Diffusion models demonstrate remarkable capabilities in capturing complex data distributions and have achieved compelling results in many generative tasks. While they have recently been extended to dense prediction tasks such as depth estimation and surface normal prediction, their full potential in this area remains underexplored. As target signal maps and input images are pixel-wise aligned, the conventional noise-to-data generation paradigm is inefficient, and input images can serve as a more informative prior compared to pure noise. Diffusion bridge models, which support data-to-data generation between two general data distributions, offer a promising alternative, but they typically fail to exploit the rich visual priors embedded in large pretrained foundation models. To address these limitations, we integrate diffusion bridge formulation with structured visual priors and introduce DPBridge, the first latent diffusion bridge framework for dense prediction tasks. To resolve the incompatibility between diffusion bridge models and pretrained diffusion backbones, we propose (1) a tractable reverse transition kernel for the diffusion bridge process, enabling maximum likelihood training scheme; (2) finetuning strategies including distribution-aligned normalization and image consistency loss. Experiments across extensive benchmarks validate that our method consistently achieves superior performance, demonstrating its effectiveness and generalization capability under different scenarios.
翻译:扩散模型在捕捉复杂数据分布方面展现出卓越能力,并在众多生成任务中取得了引人注目的成果。尽管近期已将其扩展至深度估计和表面法线预测等密集预测任务,但其在该领域的全部潜力仍未得到充分探索。由于目标信号图与输入图像在像素级别对齐,传统的噪声到数据生成范式效率低下,而输入图像相较于纯噪声可提供更具信息量的先验。扩散桥模型支持两种通用数据分布之间的数据到数据生成,提供了一种有前景的替代方案,但这类模型通常未能充分利用大型预训练基础模型中嵌入的丰富视觉先验。为应对这些局限性,我们将扩散桥公式与结构化视觉先验相结合,提出了DPBridge——首个面向密集预测任务的潜在扩散桥框架。为解决扩散桥模型与预训练扩散主干网络之间的兼容性问题,我们提出:(1)针对扩散桥过程的可解析反向转移核,支持最大似然训练方案;(2)包含分布对齐归一化和图像一致性损失的微调策略。在广泛基准测试上的实验验证表明,我们的方法在不同场景下均能持续取得优越性能,证明了其有效性和泛化能力。