In this paper, we present Uformer, an effective and efficient Transformer-based architecture, in which we build a hierarchical encoder-decoder network using the Transformer block for image restoration. Uformer has two core designs to make it suitable for this task. The first key element is a local-enhanced window Transformer block, where we use non-overlapping window-based self-attention to reduce the computational requirement and employ the depth-wise convolution in the feed-forward network to further improve its potential for capturing local context. The second key element is that we explore three skip-connection schemes to effectively deliver information from the encoder to the decoder. Powered by these two designs, Uformer enjoys a high capability for capturing useful dependencies for image restoration. Extensive experiments on several image restoration tasks demonstrate the superiority of Uformer, including image denoising, deraining, deblurring and demoireing. We expect that our work will encourage further research to explore Transformer-based architectures for low-level vision tasks. The code and models will be available at https://github.com/ZhendongWang6/Uformer.
翻译:在本文中,我们介绍一个基于变异器的高效和高效的变异器结构,我们在这个结构中利用变异器块建立一个等级编码器-变异器网络,以恢复图像。 UEx有两个核心设计,以使它适合这项任务。第一个关键要素是一个本地强化的窗口变异器块,我们使用非重叠窗口式的自我关注来减少计算要求,并在进料前网络中采用深度变异来进一步提高其捕捉当地环境的潜力。第二个关键要素是,我们探索三个跳过连接计划,以便有效地将信息从编码器传送到变异器。在这两个设计下, Uexer具有捕捉有用的图像恢复依赖性的高度能力。在几个图像恢复任务上进行了广泛的实验,展示了变异装置的优势,包括图像脱色、脱色、分光、分流和粉色化。我们期望我们的工作将鼓励进一步研究,以探索基于变异器的低级愿景任务结构。代码和模型将在 https://githubabr.com/Zhengang6/U上提供。