视觉变异器缩小:在连续优化空间进行多维搜索 (Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space)

This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework. It can search a sub-structure from the original model end-to-end across multiple dimensions, including the input tokens, MHSA and MLP modules with state-of-the-art performance. Our method is based on a learnable and unified $\ell_1$ sparsity constraint with pre-defined factors to reflect the global importance in the continuous searching space of different dimensions. The searching process is highly efficient through a single-shot training scheme. For instance, on DeiT-S, ViT-Slim only takes ~43 GPU hours for the searching process, and the searched structure is flexible with diverse dimensionalities in different modules. Then, a budget threshold is employed according to the requirements of accuracy-FLOPs trade-off on running devices, and a re-training process is performed to obtain the final model. The extensive experiments show that our ViT-Slim can compress up to 40% of parameters and 40% FLOPs on various vision transformers while increasing the accuracy by ~0.6% on ImageNet. We also demonstrate the advantage of our searched models on several downstream datasets. Our code is available at https://github.com/Arnav0400/ViT-Slim.

翻译：本文探索了从视觉变压器中找到最佳的子模型的可行性, 并引入了纯视觉变压器微缩( ViT- Slim) 框架。它可以从原始模型端到端的多维方面搜索一个子结构, 包括输入牌、 MHSA 和具有最先进的性能的 MLP 模块。我们的方法基于一个可学习和统一的 $\ ell_ 1$ 1$ 的宽度限制, 并预先界定各种因素, 以反映连续搜索空间在不同维度中的全球重要性。广泛的实验显示, 我们的 ViT- Slim 可以通过单发培训计划非常高效地将参数的40%和40% FLOPs 仅为搜索进程使用~ 43 GPU 小时, 搜索结构具有不同模块中不同维度的灵活度。然后, 我们根据运行设备的准确度- FLOPs 交换要求使用预算门槛, 并进行再培训进程, 以获得最终模型。广泛的实验显示, 我们的 ViT- Slim 能够将参数的40% 参数和40% FLOPs 在各种图像模型上, 我们的Reb- serbs 正在的图像的精确度上, 正在提高。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR 2022】基于windows的图像压缩注意，The Devil Is in the Details: Window-based Attention for Image Compression

专知会员服务

8+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日