Transformer architectures show spectacular performance on NLP tasks and have recently also been used for tasks such as image completion or image classification. Here we propose to use a sequential image representation, where each prefix of the complete sequence describes the whole image at reduced resolution. Using such Fourier Domain Encodings (FDEs), an auto-regressive image completion task is equivalent to predicting a higher resolution output given a low-resolution input. Additionally, we show that an encoder-decoder setup can be used to query arbitrary Fourier coefficients given a set of Fourier domain observations. We demonstrate the practicality of this approach in the context of computed tomography (CT) image reconstruction. In summary, we show that Fourier Image Transformer (FIT) can be used to solve relevant image analysis tasks in Fourier space, a domain inherently inaccessible to convolutional architectures.
翻译:变换器结构在 NLP 任务上表现惊人, 最近也被用于图像完成或图像分类等任务 。 在此, 我们提议使用顺序图像表示, 完整序列的每个前缀都以较低的分辨率描述整个图像 。 使用这样的 Fourier 域编码, 自动递减图像完成任务相当于预测高分辨率输出, 并给出低分辨率输入 。 此外, 我们显示, 编码器- 解码器设置可以用来查询任意的 Fourier 系数, 并给出一组 Fourier 域观测 。 我们展示了在计算图像重建过程中这一方法的实用性 。 总之, 我们显示 Fourier 图像变换器( FIT) 可用于解决 Fourier 空间的相关图像分析任务, 富莱尔空间是革命结构中固有的一个无法进入的领域 。