以利益所在区域为基础的神经神经视频压缩 (Region-of-Interest Based Neural Video Compression)

Humans do not perceive all parts of a scene with the same resolution, but rather focus on few regions of interest (ROIs). Traditional Object-Based codecs take advantage of this biological intuition, and are capable of non-uniform allocation of bits in favor of salient regions, at the expense of increased distortion the remaining areas: such a strategy allows a boost in perceptual quality under low rate constraints. Recently, several neural codecs have been introduced for video compression, yet they operate uniformly over all spatial locations, lacking the capability of ROI-based processing. In this paper, we introduce two models for ROI-based neural video coding. First, we propose an implicit model that is fed with a binary ROI mask and it is trained by de-emphasizing the distortion of the background. Secondly, we design an explicit latent scaling method, that allows control over the quantization binwidth for different spatial regions of latent variables, conditioned on the ROI mask. By extensive experiments, we show that our methods outperform all our baselines in terms of Rate-Distortion (R-D) performance in the ROI. Moreover, they can generalize to different datasets and to any arbitrary ROI at inference time. Finally, they do not require expensive pixel-level annotations during training, as synthetic ROI masks can be used with little to no degradation in performance. To the best of our knowledge, our proposals are the first solutions that integrate ROI-based capabilities into neural video compression models.

翻译：人类并不以同一分辨率看待某一场景的所有部分,而是将注意力集中在少数感兴趣的区域。传统的基于目标的编码器利用了这种生物直觉,能够不统一地分配比特以有利于突出区域,而牺牲了对其余区域的扭曲:这种战略使得在低速限制下能够提高感知质量。最近,引入了几个神经编码器用于视频压缩,但它们在所有空间位置上运作一致,缺乏基于ROI的处理能力。在本文中,我们引入了基于ROI的神经神经视频编码的两个模型。首先,我们提出了一个隐含的模型,以二进制的ROI面具为原料,通过对背景的扭曲进行分化来进行训练。其次,我们设计了一个明确的潜在缩放方法,可以控制不同空间区域潜在变量的四分化硬盘,以ROI面具为条件。我们通过广泛的实验,我们的方法超越了我们所有基于R-D的调制导神经神经视频编码的基线。首先,我们提出了一个隐含隐含模型的模型, 并用一个隐含的模型来喂养成一个隐含的比喻的模型, 而在ROI 中,在使用任何不易变造的ROI 数据中, 中,它们可以将最佳的模拟中, 将最佳的性变现数据变成为一种不易变现。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日