用于蒙面图像建模的绿色高层次愿景变异器 (Green Hierarchical Vision Transformer for Masked Image Modeling)

We present an efficient approach for Masked Image Modeling (MIM) with hierarchical Vision Transformers (ViTs), allowing the hierarchical ViTs to discard masked patches and operate only on the visible ones. Our approach consists of three key designs. First, for window attention, we propose a Group Window Attention scheme following the Divide-and-Conquer strategy. To mitigate the quadratic complexity of the self-attention w.r.t. the number of patches, group attention encourages a uniform partition that visible patches within each local window of arbitrary size can be grouped with equal size, where masked self-attention is then performed within each group. Second, we further improve the grouping strategy via the Dynamic Programming algorithm to minimize the overall computation cost of the attention on the grouped patches. Third, as for the convolution layers, we convert them to the Sparse Convolution that works seamlessly with the sparse data, i.e., the visible patches in MIM. As a result, MIM can now work on most, if not all, hierarchical ViTs in a green and efficient way. For example, we can train the hierarchical ViTs, e.g., Swin Transformer and Twins Transformer, about 2.7$\times$ faster and reduce the GPU memory usage by 70%, while still enjoying competitive performance on ImageNet classification and the superiority on downstream COCO object detection benchmarks. Code and pre-trained models have been made publicly available at https://github.com/LayneH/GreenMIM.

翻译：我们提出了一个使用高等级视野变异器的隐蔽图像模型(MIM)的高效方法,使等级ViT系统可以丢弃遮盖的遮蔽部分,并且只能在可见的部位运行。我们的方法包括三个关键设计。首先,为了窗口注意,我们提议了一个根据分化和征服战略的集团窗口注意方案。为了减轻自我注意的偏差复杂性,为了减少偏差数量,群体注意鼓励一个统一的分隔,每个任意大小的本地窗口中可见的补丁可以以同等大小分组,然后在每组内进行隐藏的自我注意。第二,我们通过动态程序算法进一步改进组合战略,以尽量减少分组补补补的注意力的总体计算费用。第三,对于卷变层,我们把它们转换成与稀疏的数据(即,MIM的可见补丁补丁)之间无缝的松散。结果是,MIM现在可以以绿色和高效的方式对每个本地的等级 ViT进行分类。第二,我们通过动态程序改进组合战略,通过Schillal ViT的升级和S-Cyal Revyal使用速度,我们可以对70的Scial-deal Styal Syal Syal-deal-deal-deal Stal listeal-deal livaldal adal 70,我们可以对70 和Syal-hal-hal-hal-hal-deal-deal-deal-deal-deal-deal-deal-hal livaldal-deal li) 进行快速进行70 。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日