提升时空视频超分辨率的空时特征交互机制 (Enhancing Space-time Video Super-resolution via Spatial-temporal Feature Interaction)

The target of space-time video super-resolution (STVSR) is to increase both the frame rate (also referred to as the temporal resolution) and the spatial resolution of a given video. Recent approaches solve STVSR using end-to-end deep neural networks. A popular solution is to first increase the frame rate of the video; then perform feature refinement among different frame features; and last increase the spatial resolutions of these features. The temporal correlation among features of different frames is carefully exploited in this process. The spatial correlation among features of different (spatial) resolutions, despite being also very important, is however not emphasized. In this paper, we propose a spatial-temporal feature interaction network to enhance STVSR by exploiting both spatial and temporal correlations among features of different frames and spatial resolutions. Specifically, the spatial-temporal frame interpolation module is introduced to interpolate low- and high-resolution intermediate frame features simultaneously and interactively. The spatial-temporal local and global refinement modules are respectively deployed afterwards to exploit the spatial-temporal correlation among different features for their refinement. Finally, a novel motion consistency loss is employed to enhance the motion continuity among reconstructed frames. We conduct experiments on three standard benchmarks, Vid4, Vimeo-90K and Adobe240, and the results demonstrate that our method improves the state of the art methods by a considerable margin. Our codes will be available at https://github.com/yuezijie/STINet-Space-time-Video-Super-resolution.

翻译：时空视频超分辨率（STVSR）的目标是提高给定视频的帧率（也被称为时间分辨率）和空间分辨率。最近的方法使用端到端深度神经网络来解决STVSR。一种流行的解决方案是首先增加视频的帧率，然后在不同帧特征之间进行特征细化，最后提高这些特征的空间分辨率。在这个过程中，不同帧之间的特征的时间相关性被精细地利用。然而，研究人员并没有充分重视不同（空间）分辨率的特征之间的空间相关性，尽管这方面也非常重要。在本文中，我们提出了一种空时特征交互网络，通过利用不同帧和空间分辨率的特征之间的空间和时间相关性来增强STVSR。具体来说，引入了空时帧内插模块来同时插值低分辨率和高分辨率中间帧特征并交互地进行优化。之后，通过空时局部和全局细化模块来分别利用不同特征之间的空时相关性来进行优化。最后，采用一种新颖的运动一致性损失来增强重构帧之间的运动连续性。我们在三个标准基准数据集中进行了实验，包括Vid4、Vimeo-90K和Adobe240，结果表明我们的方法显著提高了现有方法的性能。我们的源代码将在https://github.com/yuezijie/ STINet-空时视频超分辨率公开发布。