Pre-trained Latent Diffusion Models (LDMs) have recently shown strong perceptual priors for low-level vision tasks, making them a promising direction for multi-exposure High Dynamic Range (HDR) reconstruction. However, directly applying LDMs to HDR remains challenging due to: (1) limited dynamic-range representation caused by 8-bit latent compression, (2) high inference cost from multi-step denoising, and (3) content hallucination inherent to generative nature. To address these challenges, we introduce GMODiff, a gain map-driven one-step diffusion framework for multi-exposure HDR reconstruction. Instead of reconstructing full HDR content, we reformulate HDR reconstruction as a conditionally guided Gain Map (GM) estimation task, where the GM encodes the extended dynamic range while retaining the same bit depth as LDR images. We initialize the denoising process from an informative regression-based estimate rather than pure noise, enabling the model to generate high-quality GMs in a single denoising step. Furthermore, recognizing that regression-based models excel in content fidelity while LDMs favor perceptual quality, we leverage regression priors to guide both the denoising process and latent decoding of the LDM, suppressing hallucinations while preserving structural accuracy. Extensive experiments demonstrate that our GMODiff performs favorably against several state-of-the-art methods and is 100 faster than previous LDM-based methods.
翻译:预训练的潜在扩散模型(LDMs)最近在低层视觉任务中展现出强大的感知先验,使其成为多曝光高动态范围(HDR)重建的一个有前景的方向。然而,直接将LDMs应用于HDR重建仍面临挑战,原因包括:(1)8位潜在压缩导致的动态范围表示受限,(2)多步去噪带来的高推理成本,以及(3)生成模型固有的内容幻觉问题。为解决这些挑战,我们提出了GMODiff,一种基于增益图的单步扩散框架,用于多曝光HDR重建。我们并未直接重建完整的HDR内容,而是将HDR重建重新表述为条件引导的增益图(GM)估计任务,其中GM编码扩展的动态范围,同时保持与低动态范围(LDR)图像相同的位深度。我们从基于回归的信息化估计(而非纯噪声)初始化去噪过程,使模型能够在单步去噪中生成高质量增益图。进一步地,考虑到基于回归的模型在内容保真度上表现优异,而LDMs更侧重于感知质量,我们利用回归先验来引导LDM的去噪过程和潜在解码,从而抑制幻觉并保持结构准确性。大量实验表明,我们的GMODiff在多项性能上优于多种先进方法,且比先前基于LDM的方法快100倍。