Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. That is, one is with compactness and efficiency to serve for machine vision, and the other is with full fidelity, bowing to human perception. The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e. Compact Descriptors for Visual Search and Compact Descriptors for Video Analysis, promote the sustainable and fast development in their own directions, respectively. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG standardization efforts1. Towards collaborative compression and intelligent analytics, VCM attempts to bridge the gap between feature coding for machine vision and video coding for human vision. Aligning with the rising Analyze then Compress instance Digital Retina, the definition, formulation, and paradigm of VCM are given first. Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications. Finally, we come up with potential VCM solutions, and the preliminary results have demonstrated the performance and efficiency gains. Further direction is discussed as well.
翻译:视频编码,目标是压缩和重建整个框架,并设置压缩,仅保存和传递最关键的信息,在规模的两端站立。 也就是说,一个是压缩和效率,为机器视觉服务,另一个是完全忠诚,屈从于人的看法。 近期在视频压缩的即将出现趋势方面所做的努力,例如深层学习的编码工具和端至端图像/视频编码,以及MPEG-7 的缩略语描述标准,即视频分析的视觉搜索和契约描述符,促进其自身方向的可持续发展和快速发展。 在本文中,由于AI技术的兴起,例如预测和生成模型,我们在新的MPEG标准化努力中探索了机器的视频编码。 实现协作压缩和智能分析,VCM试图缩小视频和视频编码之间在视频分析中的特征编码差距,在视频分析和视频分析中,从随后的剖析和快速应用,从数字格式分析中展现了我们不断上升的图像格式化和图像格式化的模型,最终定义了VCMEM和图像格式化模式的深度,我们从系统化的图像和图像格式化向上展示了成本的深度的深度的图像和结构。