Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) takes low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping ($1\times1$ convolution) stage and token mixing stage simultaneously. We show that the frequency self-attention requires $87.29\% \sim 90.04\%$ less memory, $96.13\% \sim 98.07\%$ less FLOPs, and $97.56\% \sim 98.18\%$ in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, FsaNet achieves a new state-of-the-art result ($83.0\%$ mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug.
翻译:考虑到图像的光谱特性,我们提出一个新的自我注意机制,其计算复杂性会大大降低,达到线性速度。为了更好地保存边缘,同时促进物体内部的相似性,我们提出不同频带的个性化过程。特别是,我们研究一个过程仅仅超过低频组件的情况。我们通过消化研究表明,低频自我注意可以达到与全频相比的非常接近或更好的性能,即使没有再培训网络,也能够显示,即使没有再培训,低频自我注意也可以达到非常接近或更好的性能。因此,我们设计和将新的插件和游戏模块嵌入CNN网络的首端,我们称之为FsaNet。频率自我注意系数1 低频率系数1 输入,2 可以在数学上等同于空间域域内对线性结构的自我注意,3 简化象征性绘图(1美元递增)阶段和象征性混合阶段。我们表明,频率自我注意需要87.29美元=sim 90.04美元减少记忆,96.20美元=98.07美元,我们称之为FsaNet 和97.56=10美元 网络的竞争性结果,定期对城市进行自动测试。