Effective Receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed at most during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERF remains not large enough instead, or heavy non-local attention mechanisms, which limit the potential of high resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the first time in learned image compression community, we introduce a few large kernel-based depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to wide range of image diversity, we propose to enhance the adaptability of convolutions via generating weights in a self-conditioned manner. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter point-wise interactions. We also investigate improved training techniques to fully exploit the potential of large kernels. In addition, to enhance the interactions among channels, we propose the adaptive channel-wise bit allocation via generating channel importance factor in a self-conditioned manner. To demonstrate the effectiveness of proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, LLIC-TCM. Extensive experiments demonstrate our proposed LLIC models have significant improvements over corresponding baselines and achieve state-of-the-art performances and better trade-off between performance and complexity.
翻译:暂无翻译