在以变换器为基础的物体探测器中有效利用多空间特征 (Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors) - 专知论文

会员服务 ·

0

稀疏 · Extensibility · Projection · Guidance · Attention ·

2022 年 8 月 24 日

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

翻译：在以变换器为基础的物体探测器中有效利用多空间特征

Gongjie Zhang,Zhipeng Luo,Yingchen Yu,Zichen Tian,Jingyi Zhang,Shijian Lu

from arxiv, Project page: https://github.com/ZhangGongjie/IMFA

Multi-scale features have been proven highly effective for object detection, and most ConvNet-based object detectors adopt Feature Pyramid Network (FPN) as a basic component for exploiting multi-scale features. However, for the recently proposed Transformer-based object detectors, directly incorporating multi-scale features leads to prohibitive computational overhead due to the high complexity of the attention mechanism for processing high-resolution features. This paper presents Iterative Multi-scale Feature Aggregation (IMFA) -- a generic paradigm that enables the efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with slight computational overhead. Project page: https://github.com/ZhangGongjie/IMFA.

翻译：事实证明,多尺度的功能对于物体探测非常有效,大多数基于ConvNet的物体探测器都采用地貌型金字塔网络(FPN)作为利用多尺度特征的基本组成部分。然而,对于最近提议的基于变换器的物体探测器来说,直接采用多尺度特征,直接纳入多尺度特征,导致计算间接费用令人望而却步,因为处理高分辨率特征的注意机制高度复杂。本文介绍了迭代性多尺度特征聚合(IMFA) -- -- 一种通用范例,使得能够有效利用基于变换器的物体探测器中的多尺度特征。核心理念是从几个关键地点利用稀有的多尺度特征,而这是通过两种新设计实现的。首先,IMFA重组了变换器编码的编码解码管道,以便根据探测预测进行迭代更新。第二,IMFA稀疏的样本比例适应性特征,用于在先前检测预测指导下从几个关键地点进行精细的检测。结果是,抽样多尺度特征仍然稀少,但对于对象的升级目标检测仍然非常有益。IMFA的多层次测试。 IMFA IMFA-S-G-S-Syalalalalcalcalb-compal 实验显示,拟议的多级的模拟测试。

0

相关内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

电、碳协调交易机制及其效益评估的研究

国家自然科学基金

0+阅读 · 2013年12月31日

单分子量子体系热输运机理与声子调控

国家自然科学基金

0+阅读 · 2012年12月31日

融合多视觉对象的行为分析与语义描述

国家自然科学基金

1+阅读 · 2012年12月31日

Ag@Fe3O4/TiO2微纳分级结构的构筑及增强光催化活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

变压器式可控电抗器磁集成与解耦控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

疏水离子液体－酶均相体系的构建、表征与性能调控

国家自然科学基金

0+阅读 · 2011年12月31日

NiFe/Cu多层膜的织构形成机理和继承关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

反胶束体系中纳米结构空心金属@SiO2形成机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

具有多重响应性的超分子体系的构筑及其逻辑器件性质的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection

Arxiv

0+阅读 · 2022年10月5日

Centralized Feature Pyramid for Object Detection

Arxiv

0+阅读 · 2022年10月5日

Centroid Distance Keypoint Detector for Colored Point Clouds

Arxiv

0+阅读 · 2022年10月4日

Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration

Arxiv

0+阅读 · 2022年10月3日

Fully Sparse 3D Object Detection

Fully Sparse 3D Object Detection

Arxiv

0+阅读 · 2022年10月3日

Fully Transformer Network for Change Detection of Remote Sensing Images

Arxiv

1+阅读 · 2022年10月3日

An efficient encoder-decoder architecture with top-down attention for speech separation

Arxiv

0+阅读 · 2022年9月30日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

相关论文

Spatio-Temporal Learnable Proposals for End-to-End Video Object Detection

Arxiv

0+阅读 · 2022年10月5日

Centralized Feature Pyramid for Object Detection

Arxiv

0+阅读 · 2022年10月5日

Centroid Distance Keypoint Detector for Colored Point Clouds

Arxiv

0+阅读 · 2022年10月4日

Dual-former: Hybrid Self-attention Transformer for Efficient Image Restoration

Arxiv

0+阅读 · 2022年10月3日

Fully Sparse 3D Object Detection

Fully Sparse 3D Object Detection

Arxiv

0+阅读 · 2022年10月3日

Fully Transformer Network for Change Detection of Remote Sensing Images

Arxiv

1+阅读 · 2022年10月3日

An efficient encoder-decoder architecture with top-down attention for speech separation

Arxiv

0+阅读 · 2022年9月30日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

相关基金

电、碳协调交易机制及其效益评估的研究

国家自然科学基金

0+阅读 · 2013年12月31日

单分子量子体系热输运机理与声子调控

国家自然科学基金

0+阅读 · 2012年12月31日

融合多视觉对象的行为分析与语义描述

国家自然科学基金

1+阅读 · 2012年12月31日

Ag@Fe3O4/TiO2微纳分级结构的构筑及增强光催化活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

变压器式可控电抗器磁集成与解耦控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

疏水离子液体－酶均相体系的构建、表征与性能调控

国家自然科学基金

0+阅读 · 2011年12月31日

NiFe/Cu多层膜的织构形成机理和继承关系研究

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

反胶束体系中纳米结构空心金属@SiO2形成机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

具有多重响应性的超分子体系的构筑及其逻辑器件性质的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员