H2-Setereo:高频、高分辨率钢流视频系统 (H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System)

High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we propose a dual camera system, in which one camera captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details, and the other captures low-spatial-resolution high-frame-rate (LSR-HFR) videos with smooth temporal details. We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to enhance both camera views to high spatiotemporal resolution (HSTR) for reconstructing the H2-Stereo video effectively. We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based warping for LSR-HFR view and complementary warping for HSR-LFR view. A multi-scale fusion method in feature domain is proposed to minimize occlusion-induced warping ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner using our collected high-quality Stereo Video dataset from YouTube. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods for both views on synthetic data and camera-captured real data with large disparity. Ablation studies explore various aspects, including spatiotemporal resolution, camera baseline, camera desynchronization, long/short exposures and applications, of our system to fully understand its capability for potential applications.

翻译：高速度、高分辨率立体声片(H2-Stereo)视频让我们能在微粒度下看到动态的 3D 内容。然而,获得 H2-Stereo 视频仍然对商品摄像头构成挑战。现有的空间超分辨率或时间框架内插图方法分别提供了缺乏时间或空间细节的妥协解决方案。为了有效缓解这一问题,我们提议了一个双摄像系统,其中一台相机拍摄高空间分辨率低框架(HSR-LFR)视频,其中包含丰富的空间细节,而其他摄取的则是低空间分辨率高清晰度(LSR-HFR)视频。我们随后设计了一个学习信息整合网络(LIFnet),利用跨相机的重复功能来提高摄像器对高波感光度解析(HSTR)视频的两种观点。我们利用一个差异网络,将现有各种观点的上流数据传输到大差异状态应用程序,我们提议在LIS-HFR(L)域域域内以差异导的流向下浏览大型流流流流数据, 演示的甚高分辨率解法的深度数据,包括甚高频解的硬质解的磁带数据系统, 演示的深度数据演示的深度的深度数据演示的深度数据演示的深度数据能力, 演示的深度数据演示的深度的深度数据演示的深度的深度数据能力, 演示的深度的深度数据演示的深度的深度数据演示的深度的深度数据演示的深度的深度数据。