DPT: 用于视觉识别的变形补丁变异器 (DPT: Deformable Patch-based Transformer for Visual Recognition) - 专知论文

会员服务 ·

0

变换 · Extensibility · 目标检测 · 图片分类 · Mask R-CNN ·

2021 年 7 月 30 日

DPT: Deformable Patch-based Transformer for Visual Recognition

翻译：DPT: 用于视觉识别的变形补丁变异器

Zhiyang Chen,Yousong Zhu,Chaoyang Zhao,Guosheng Hu,Wei Zeng,Jinqiao Wang,Ming Tang

from arxiv, In Proceedings of the 29th ACM International Conference on Multimedia (MM '21)

Transformer has achieved great success in computer vision, while how to split patches in an image remains a problem. Existing methods usually use a fixed-size patch embedding which might destroy the semantics of objects. To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches. The DePatch module can work as a plug-and-play module, which can easily be incorporated into different transformers to achieve an end-to-end training. We term this DePatch-embedded transformer as Deformable Patch-based Transformer (DPT) and conduct extensive evaluations of DPT on image classification and object detection. Results show DPT can achieve 81.9% top-1 accuracy on ImageNet classification, and 43.7% box mAP with RetinaNet, 44.3% with Mask R-CNN on MSCOCO object detection. Code has been made available at: https://github.com/CASIA-IVA-Lab/DPT .

翻译：计算机变换器在计算机视觉方面取得了巨大成功, 而如何在图像中分割补丁仍是一个问题。现有的方法通常使用固定大小的补丁嵌入模块, 可能会破坏对象的语义学。为了解决这个问题, 我们提议一个新的可变换的补丁( DePatch) 模块, 该模块可以以数据驱动的方式, 适应性地将图像分成不同位置和比例的补丁, 而不是使用预定义的固定补丁。这样, 我们的方法可以在补丁中保存语义。 DePatch 模块可以作为一个插件和游戏模块发挥作用, 可以很容易地融入不同的变异器, 从而实现终端到终端的培训。我们将这个可变异的调配制变异器命名为可变换的补丁基变换器( DPT), 并对图像分类和对象检测的DPT进行广泛的评价。结果显示 DPT 在图像网络分类上可以达到81.9%的最高-1 精确度, 在 RetinaNet 上可以达到43. 7% 框 mAP, 在 MS K R- CN 目标检测上使用 MSCOC R- IV/ CASAB/ CASALDP 。。代码已经在 http http:// 。

0

相关内容

基于图的异常检测，94页ppt

专知会员服务

77+阅读 · 2021年9月27日

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

专知会员服务

22+阅读 · 2021年4月20日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知

18+阅读 · 2020年10月11日

CVPR2019 | 全景分割：Attention-guided Unified Network

CVPR2019 | 全景分割：Attention-guided Unified Network

极市平台

9+阅读 · 2019年3月3日

弱监督语义分割最新方法资源列表

弱监督语义分割最新方法资源列表

专知

9+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

6+阅读 · 2019年1月2日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

You Cannot Easily Catch Me: A Low-Detectable Adversarial Patch for Object Detectors

Arxiv

0+阅读 · 2021年9月30日

Transformer-based Map Matching Model with Limited Ground-Truth Data using Transfer-Learning Approach

Arxiv

0+阅读 · 2021年9月30日

Geometry-Entangled Visual Semantic Transformer for Image Captioning

Arxiv

1+阅读 · 2021年9月29日

OadTR: Online Action Detection with Transformers

Arxiv

7+阅读 · 2021年6月21日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Involution: Inverting the Inherence of Convolution for Visual Recognition

Arxiv

6+阅读 · 2021年3月10日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

DPatch: An Adversarial Patch Attack on Object Detectors

DPatch: An Adversarial Patch Attack on Object Detectors

Arxiv

4+阅读 · 2018年9月15日

VIP会员

文章信息

相关主题

相关VIP内容

基于图的异常检测，94页ppt

专知会员服务

77+阅读 · 2021年9月27日

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

专知会员服务

22+阅读 · 2021年4月20日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《运用作战人员数字孪生与生成式人工智能预测任务成果》最新文献

2025全球人工智能展望报告：通向AGI之路，76页ppt

《概率数值计算：贝叶斯求积法与人机协作》最新博士论文

【NTU博士论文】多模态神经三维资产合成

相关资讯

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知

18+阅读 · 2020年10月11日

CVPR2019 | 全景分割：Attention-guided Unified Network

CVPR2019 | 全景分割：Attention-guided Unified Network

极市平台

9+阅读 · 2019年3月3日

弱监督语义分割最新方法资源列表

弱监督语义分割最新方法资源列表

专知

9+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

6+阅读 · 2019年1月2日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

【推荐】视频目标分割基础

【推荐】视频目标分割基础

机器学习研究会

9+阅读 · 2017年9月19日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

You Cannot Easily Catch Me: A Low-Detectable Adversarial Patch for Object Detectors

Arxiv

0+阅读 · 2021年9月30日

Transformer-based Map Matching Model with Limited Ground-Truth Data using Transfer-Learning Approach

Arxiv

0+阅读 · 2021年9月30日

Geometry-Entangled Visual Semantic Transformer for Image Captioning

Arxiv

1+阅读 · 2021年9月29日

OadTR: Online Action Detection with Transformers

Arxiv

7+阅读 · 2021年6月21日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Involution: Inverting the Inherence of Convolution for Visual Recognition

Arxiv

6+阅读 · 2021年3月10日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

DPatch: An Adversarial Patch Attack on Object Detectors

DPatch: An Adversarial Patch Attack on Object Detectors

Arxiv

4+阅读 · 2018年9月15日

微信扫码咨询专知VIP会员