RSIR Transformer: 使用随机采样窗口和重要区域窗口的分层视觉 Transformer (RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows) - 专知论文

会员服务 ·

0

随机采样 · 变换 · 分层 · 视觉任务 · IR ·

2023 年 4 月 13 日

RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows

翻译：RSIR Transformer: 使用随机采样窗口和重要区域窗口的分层视觉 Transformer

Zhemin Zhang,Xun Gong

Recently, Transformers have shown promising performance in various vision tasks. However, the high costs of global self-attention remain challenging for Transformers, especially for high-resolution vision tasks. Local self-attention runs attention computation within a limited region for the sake of efficiency, resulting in insufficient context modeling as their receptive fields are small. In this work, we introduce two new attention modules to enhance the global modeling capability of the hierarchical vision transformer, namely, random sampling windows (RS-Win) and important region windows (IR-Win). Specifically, RS-Win sample random image patches to compose the window, following a uniform distribution, i.e., the patches in RS-Win can come from any position in the image. IR-Win composes the window according to the weights of the image patches in the attention map. Notably, RS-Win is able to capture global information throughout the entire model, even in earlier, high-resolution stages. IR-Win enables the self-attention module to focus on important regions of the image and capture more informative features. Incorporated with these designs, RSIR-Win Transformer demonstrates competitive performance on common vision tasks.

翻译：近期，Transformer 在各种视觉任务中表现出不俗的性能。然而，全局自注意力的高计算成本仍然是 Transformer 的一个难题，尤其是对于高分辨率的视觉任务。局部自注意因其接受域较小而进行了局部的注意力计算，以求高效，但是这种方法导致了上下文建模的不足。在这项工作中，我们引入了两个新的注意力模块，以增强分层视觉Transformer的全局建模能力，即随机采样窗口(RS-Win)和重要区域窗口(IR-Win)。具体地，RS-Win 从图像中随机采样图像块以构成窗口，其采样遵循均匀分布，即RS-Win 中的块可以来自图像中的任何位置。IR-Win 根据注意力图像块的权重构成窗口。值得注意的是，RS-Win 能够在整个模型中捕捉全局信息，即使在早期的高分辨率阶段中也是如此。IR-Win 使自注意模块能够关注图像的重要区域并捕捉更丰富的特征。结合这些设计，RSIR-Win Transformer 在常见的视觉任务上表现出了竞争力的性能。

0

相关内容

随机采样

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

专知会员服务

21+阅读 · 2023年3月31日

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

专知会员服务

21+阅读 · 2022年2月12日

图像分割二十年，盘点影响力最大的10篇论文

图像分割二十年，盘点影响力最大的10篇论文

专知会员服务

45+阅读 · 2022年2月7日

【CVPR2021】动态区域注意卷积

专知会员服务

21+阅读 · 2021年4月2日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

一文带你浏览Graph Transformers

一文带你浏览Graph Transformers

PaperWeekly

1+阅读 · 2022年7月8日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BatchNorm的“平替”？TUM提出KNConvNets，消除CNN中BatchNorm的缺点

BatchNorm的“平替”？TUM提出KNConvNets，消除CNN中BatchNorm的缺点

PaperWeekly

0+阅读 · 2022年6月1日

CVPR 2022 | 清华开源DAT：具有可变形注意力的视觉Transformer

CVPR 2022 | 清华开源DAT：具有可变形注意力的视觉Transformer

CVer

1+阅读 · 2022年5月24日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

多重假设检验中的k-FWER控制

国家自然科学基金

0+阅读 · 2015年12月31日

数据中心以太网拥塞控制

国家自然科学基金

1+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于混合优化的图像三维重建方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素通过ERα介导lncRNA 1200076调节卵巢ERα（+）细胞生物学行为

国家自然科学基金

0+阅读 · 2012年12月31日

自底向上的静态图像显著性检测

国家自然科学基金

1+阅读 · 2012年12月31日

区域环境要素对栓皮栎次生林生态系统土壤有机质稳定性的影响机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于NDVI的流域产流产沙对LUCC响应的快速预测及其尺度效应

国家自然科学基金

0+阅读 · 2009年12月31日

前馈神经网络的奇异学习动态研究

国家自然科学基金

0+阅读 · 2008年12月31日

Off-By-One Implementation Error in J-UNIWARD

Arxiv

0+阅读 · 2023年5月31日

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

Arxiv

0+阅读 · 2023年5月31日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

0+阅读 · 2023年5月31日

On the Power of Foundation Models

Arxiv

1+阅读 · 2023年5月31日

FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction

Arxiv

0+阅读 · 2023年5月30日

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Arxiv

0+阅读 · 2023年5月29日

Understanding Predictive Coding as an Adaptive Trust-Region Method

Arxiv

0+阅读 · 2023年5月29日

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Arxiv

0+阅读 · 2023年5月29日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

VIP会员

文章信息

相关主题

相关VIP内容

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

专知会员服务

21+阅读 · 2023年3月31日

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

【CVPR 2022】基于Transformer的图象风格化，StyTr2: Image Style Transfer with Transformers

专知会员服务

11+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

专知会员服务

21+阅读 · 2022年2月12日

图像分割二十年，盘点影响力最大的10篇论文

图像分割二十年，盘点影响力最大的10篇论文

专知会员服务

45+阅读 · 2022年2月7日

【CVPR2021】动态区域注意卷积

专知会员服务

21+阅读 · 2021年4月2日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

一文带你浏览Graph Transformers

一文带你浏览Graph Transformers

PaperWeekly

1+阅读 · 2022年7月8日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BatchNorm的“平替”？TUM提出KNConvNets，消除CNN中BatchNorm的缺点

BatchNorm的“平替”？TUM提出KNConvNets，消除CNN中BatchNorm的缺点

PaperWeekly

0+阅读 · 2022年6月1日

CVPR 2022 | 清华开源DAT：具有可变形注意力的视觉Transformer

CVPR 2022 | 清华开源DAT：具有可变形注意力的视觉Transformer

CVer

1+阅读 · 2022年5月24日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

相关论文

Off-By-One Implementation Error in J-UNIWARD

Arxiv

0+阅读 · 2023年5月31日

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

Arxiv

0+阅读 · 2023年5月31日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

0+阅读 · 2023年5月31日

On the Power of Foundation Models

Arxiv

1+阅读 · 2023年5月31日

FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction

Arxiv

0+阅读 · 2023年5月30日

TerrainNet: Visual Modeling of Complex Terrain for High-speed, Off-road Navigation

Arxiv

0+阅读 · 2023年5月29日

Understanding Predictive Coding as an Adaptive Trust-Region Method

Arxiv

0+阅读 · 2023年5月29日

InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions

Arxiv

0+阅读 · 2023年5月29日

Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Arxiv

12+阅读 · 2021年12月30日

Learning Hierarchical Features for Visual Object Tracking with Recursive Neural Networks

Arxiv

13+阅读 · 2018年1月6日

相关基金

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

多重假设检验中的k-FWER控制

国家自然科学基金

0+阅读 · 2015年12月31日

数据中心以太网拥塞控制

国家自然科学基金

1+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于混合优化的图像三维重建方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素通过ERα介导lncRNA 1200076调节卵巢ERα（+）细胞生物学行为

国家自然科学基金

0+阅读 · 2012年12月31日

自底向上的静态图像显著性检测

国家自然科学基金

1+阅读 · 2012年12月31日

区域环境要素对栓皮栎次生林生态系统土壤有机质稳定性的影响机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于NDVI的流域产流产沙对LUCC响应的快速预测及其尺度效应

国家自然科学基金

0+阅读 · 2009年12月31日

前馈神经网络的奇异学习动态研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员