以变换器从序列到序列的视角重新思考语义分割 (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers) - 专知论文

会员服务 ·

0

FCN · 变换 · Extensibility · 可约的 · MoDELS ·

2020 年 12 月 31 日

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

翻译：以变换器从序列到序列的视角重新思考语义分割

Sixiao Zheng,Jiachen Lu,Hengshuang Zhao,Xiatian Zhu,Zekun Luo,Yabiao Wang,Yanwei Fu,Jianfeng Feng,Tao Xiang,Philip H. S. Torr,Li Zhang

from arxiv, project page at https://fudan-zvg.github.io/SETR/

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first (44.42% mIoU) position in the highly competitive ADE20K test server leaderboard.

翻译：最新的语义分解方法采用了完全进化的网络(FCN), 并配有编码器解码器结构。编码器逐渐减少空间分辨率, 并用更大的可接收字段学习更抽象/ 语义的视觉概念。由于背景模型对于分解至关重要, 最近的努力侧重于通过放大/ 突变或插入注意模块来增加可接收字段。但是, 以 FCN 结构为基础的编码器解码器结构保持不变。在本文中, 我们的目标是提供另一种观点, 将语义分解作为序列到序列的预测任务。具体地说, 我们部署一个纯的变异器( 即, 不变异和分辨率减少) 来将图像编码为补补码序列。随着在变异器的每一层中建模的全球环境, 这个编码器可以与一个简单的解码器结合起来, 以提供一个强大的分解模型, 称为 SEgmentationU20Exexexexe( SETR) 。广泛的实验显示, SETRTR在ADE20K( 50.28% 和 Excoreal ial i) Excial 555 I 和我们的MISal 528 的MI) 上, 5255, 5, 5, 5, 5, 和和 528 518 的 mI 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, m.I, 5, m.I, 5, m.I, 5, 5, 5, 5, 5, m.I, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,

10

相关内容

FCN

【ICML2020】文本摘要生成模型PEGASUS

【ICML2020】文本摘要生成模型PEGASUS

专知会员服务

34+阅读 · 2020年8月23日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

14+阅读 · 2020年5月5日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

32+阅读 · 2020年4月24日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

专知会员服务

60+阅读 · 2020年2月16日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

14+阅读 · 2020年2月1日

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

专知会员服务

25+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

PyTorch语义分割开源库semseg

PyTorch语义分割开源库semseg

极市平台

25+阅读 · 2019年6月6日

CVPR2019 | Decoders 对于语义分割的重要性

CVPR2019 | Decoders 对于语义分割的重要性

极市平台

6+阅读 · 2019年3月19日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

语义分割+视频分割开源代码集合

语义分割+视频分割开源代码集合

极市平台

35+阅读 · 2018年3月5日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Graph Transformer for Graph-to-Sequence Learning

Graph Transformer for Graph-to-Sequence Learning

Arxiv

4+阅读 · 2019年11月30日

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Arxiv

10+阅读 · 2019年11月25日

ShelfNet for Real-time Semantic Segmentation

Arxiv

7+阅读 · 2018年12月10日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

A Fully Convolutional Two-Stream Fusion Network for Interactive Image Segmentation

A Fully Convolutional Two-Stream Fusion Network for Interactive Image Segmentation

Arxiv

5+阅读 · 2018年7月6日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Arxiv

8+阅读 · 2018年1月30日

Long-term Visual Localization using Semantically Segmented Images

Arxiv

7+阅读 · 2018年1月16日

Fully Convolutional Networks for Semantic Segmentation

Arxiv

3+阅读 · 2015年3月8日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2020】文本摘要生成模型PEGASUS

【ICML2020】文本摘要生成模型PEGASUS

专知会员服务

34+阅读 · 2020年8月23日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

14+阅读 · 2020年5月5日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

32+阅读 · 2020年4月24日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

49+阅读 · 2020年3月7日

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

专知会员服务

60+阅读 · 2020年2月16日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

14+阅读 · 2020年2月1日

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

【EMNLP2019Keynote报告】神经序列模型， Neural Sequence Models，63页ppt

专知会员服务

25+阅读 · 2019年11月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

热门VIP内容

相关资讯

PyTorch语义分割开源库semseg

PyTorch语义分割开源库semseg

极市平台

25+阅读 · 2019年6月6日

CVPR2019 | Decoders 对于语义分割的重要性

CVPR2019 | Decoders 对于语义分割的重要性

极市平台

6+阅读 · 2019年3月19日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

Facebook PyText 在 Github 上开源了

Facebook PyText 在 Github 上开源了

AINLP

7+阅读 · 2018年12月14日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

(TensorFlow)实时语义分割比较研究

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

语义分割+视频分割开源代码集合

语义分割+视频分割开源代码集合

极市平台

35+阅读 · 2018年3月5日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

相关论文

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Graph Transformer for Graph-to-Sequence Learning

Graph Transformer for Graph-to-Sequence Learning

Arxiv

4+阅读 · 2019年11月30日

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Arxiv

10+阅读 · 2019年11月25日

ShelfNet for Real-time Semantic Segmentation

Arxiv

7+阅读 · 2018年12月10日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

A Fully Convolutional Two-Stream Fusion Network for Interactive Image Segmentation

A Fully Convolutional Two-Stream Fusion Network for Interactive Image Segmentation

Arxiv

5+阅读 · 2018年7月6日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Arxiv

8+阅读 · 2018年1月30日

Long-term Visual Localization using Semantically Segmented Images

Arxiv

7+阅读 · 2018年1月16日

Fully Convolutional Networks for Semantic Segmentation

Arxiv

3+阅读 · 2015年3月8日

微信扫码咨询专知VIP会员