Although diffusion models with strong visual priors have emerged as powerful dense prediction backboens, they overlook a core limitation: the stochastic noise at the core of diffusion sampling is inherently misaligned with dense prediction that requires a deterministic mapping from image to geometry. In this paper, we show that this stochastic noise corrupts fine-grained spatial cues and pushes the model toward timestep-specific noise objectives, consequently destroying meaningful geometric structure mappings. To address this, we introduce $\mathrm{D}^\mathrm{3}$-Predictor, a noise-free deterministic framework built by reformulating a pretrained diffusion model without stochasticity noise. Instead of relying on noisy inputs to leverage diffusion priors, $\mathrm{D}^\mathrm{3}$-Predictor views the pretrained diffusion network as an ensemble of timestep-dependent visual experts and self-supervisedly aggregates their heterogeneous priors into a single, clean, and complete geometric prior. Meanwhile, we utilize task-specific supervision to seamlessly adapt this noise-free prior to dense prediction tasks. Extensive experiments on various dense prediction tasks demonstrate that $\mathrm{D}^\mathrm{3}$-Predictor achieves competitive or state-of-the-art performance in diverse scenarios. In addition, it requires less than half the training data previously used and efficiently performs inference in a single step. Our code, data, and checkpoints are publicly available at https://x-gengroup.github.io/HomePage_D3-Predictor/.
翻译:尽管具有强大视觉先验的扩散模型已成为密集预测任务的有力骨干网络,但它们忽略了一个核心局限:扩散采样核心的随机噪声本质上与密集预测所需的从图像到几何结构的确定性映射相矛盾。本文指出,这种随机噪声会破坏细粒度空间线索,并将模型推向依赖于时间步长的噪声优化目标,从而破坏有意义的几何结构映射。为解决此问题,我们提出了$\mathrm{D}^\mathrm{3}$-Predictor,这是一个通过重构预训练扩散模型并消除随机性噪声而构建的无噪声确定性框架。$\mathrm{D}^\mathrm{3}$-Predictor不再依赖含噪声输入来利用扩散先验,而是将预训练的扩散网络视为一组时间步长相关的视觉专家模型,并通过自监督方式将其异构先验聚合为一个单一、清晰且完整的几何先验。同时,我们利用任务特定监督,将这一无噪声先验无缝适配到密集预测任务中。在多种密集预测任务上的大量实验表明,$\mathrm{D}^\mathrm{3}$-Predictor在多样场景中取得了具有竞争力或最先进的性能。此外,该方法所需训练数据量不到先前方法的一半,并能以单步推理高效完成预测。我们的代码、数据及模型检查点已公开于https://x-gengroup.github.io/HomePage_D3-Predictor/。