以变换器从序列到序列的视角重新思考语义分割 (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers) - 专知论文

会员服务 ·

0

FCN · 变换 · Extensibility · 可约的 · MoDELS ·

2021 年 7 月 25 日

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

翻译：以变换器从序列到序列的视角重新思考语义分割

Sixiao Zheng,Jiachen Lu,Hengshuang Zhao,Xiatian Zhu,Zekun Luo,Yabiao Wang,Yanwei Fu,Jianfeng Feng,Tao Xiang,Philip H. S. Torr,Li Zhang

from arxiv, CVPR 2021. Project page at https://fudan-zvg.github.io/SETR/

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.

翻译：最新的语义分解方法采用了完全进化的网络(FCN), 并配有编码器解码器结构。编码器逐渐减少空间分辨率, 并用更大的可接收字段学习更抽象/ 语义的视觉概念。由于环境模型对于分解至关重要, 最近的努力侧重于通过放大/ 突变或插入注意模块来增加可接收字段。但是, 以 FCN 结构为基础的编码器解码器( FCN ) 仍然保持不变。在本文中, 我们的目标是提供另一种观点, 将语义分解作为序列到序列的预测任务。具体地说, 我们部署一个纯的变异器( 即, 不熔化和分辨率减少) 来将图像编码为补补码序列。随着在变异器的每个层中建模, 这个编码器可以和一个简单的解码器来提供一个强大的分解模型, 称为 SEgmentation UTRexexexed (SETR) 。广泛的实验显示, SETRT在ADE20K (50.28) 上实现了新的艺术状态, 和 Excore Excial 5ADEI 上, 555 和我们 Excial 的MADEADEADI (SB) 和的 580 5, 5ADI 4 的和。

0

相关内容

FCN

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

专知会员服务

44+阅读 · 2021年3月15日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

Local Memory Attention for Fast Video Semantic Segmentation

Arxiv

0+阅读 · 2021年9月26日

Rethinking BiSeNet For Real-time Semantic Segmentation

Arxiv

7+阅读 · 2021年4月27日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

TransGAN: Two Transformers Can Make One Strong GAN

Arxiv

3+阅读 · 2021年2月16日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Convolutional CRFs for Semantic Segmentation

Arxiv

8+阅读 · 2018年5月15日

Semantic Binary Segmentation using Convolutional Networks without Decoders

Arxiv

8+阅读 · 2018年5月1日

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Arxiv

8+阅读 · 2018年2月7日

Semantic Segmentation via Highly Fused Convolutional Network with Multiple Soft Cost Functions

Arxiv

3+阅读 · 2018年1月4日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

专知会员服务

44+阅读 · 2021年3月15日

一份循环神经网络RNNs简明教程，37页ppt

一份循环神经网络RNNs简明教程，37页ppt

专知会员服务

173+阅读 · 2020年5月6日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

76+阅读 · 2020年4月10日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

全球AI工具市场发展现状与趋势分析2025

自动驾驶地图：全流程综述与前沿进展

协同智能体：多智能体人工智能系统如何变革军事训练及其他领域

【NeurIPS2025】TITAN：一种面向轨迹感知的大规模 VQE 自适应参数冻结技术

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

Local Memory Attention for Fast Video Semantic Segmentation

Arxiv

0+阅读 · 2021年9月26日

Rethinking BiSeNet For Real-time Semantic Segmentation

Arxiv

7+阅读 · 2021年4月27日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

TransGAN: Two Transformers Can Make One Strong GAN

Arxiv

3+阅读 · 2021年2月16日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Convolutional CRFs for Semantic Segmentation

Arxiv

8+阅读 · 2018年5月15日

Semantic Binary Segmentation using Convolutional Networks without Decoders

Arxiv

8+阅读 · 2018年5月1日

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Arxiv

8+阅读 · 2018年2月7日

Semantic Segmentation via Highly Fused Convolutional Network with Multiple Soft Cost Functions

Arxiv

3+阅读 · 2018年1月4日

微信扫码咨询专知VIP会员