Video super-resolution (VSR) aims to reconstruct a high-resolution (HR) video from a low-resolution (LR) counterpart. Achieving successful VSR requires producing realistic HR details and ensuring both spatial and temporal consistency. To restore realistic details, diffusion-based VSR approaches have recently been proposed. However, the inherent randomness of diffusion, combined with their tile-based approach, often leads to spatio-temporal inconsistencies. In this paper, we propose DC-VSR, a novel VSR approach to produce spatially and temporally consistent VSR results with realistic textures. To achieve spatial and temporal consistency, DC-VSR adopts a novel Spatial Attention Propagation (SAP) scheme and a Temporal Attention Propagation (TAP) scheme that propagate information across spatio-temporal tiles based on the self-attention mechanism. To enhance high-frequency details, we also introduce Detail-Suppression Self-Attention Guidance (DSSAG), a novel diffusion guidance scheme. Comprehensive experiments demonstrate that DC-VSR achieves spatially and temporally consistent, high-quality VSR results, outperforming previous approaches.
翻译:视频超分辨率(VSR)旨在从低分辨率(LR)视频重建出高分辨率(HR)视频。成功的VSR需要生成逼真的HR细节,并确保空间和时间上的一致性。为了恢复逼真的细节,近期提出了基于扩散模型的VSR方法。然而,扩散过程固有的随机性,结合其基于分块的处理方式,常常导致时空不一致性。本文提出DC-VSR,一种新颖的VSR方法,旨在生成具有逼真纹理且时空一致的VSR结果。为实现时空一致性,DC-VSR采用了一种新颖的空间注意力传播(SAP)方案和时间注意力传播(TAP)方案,它们基于自注意力机制在时空分块间传播信息。为了增强高频细节,我们还引入了细节抑制自注意力引导(DSSAG),一种新颖的扩散引导方案。综合实验表明,DC-VSR能够实现时空一致的高质量VSR结果,性能优于先前方法。