用于图像说明的半自动递减变换器 (Semi-Autoregressive Transformer for Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · MoDELS · 推断 · Better · 变换 ·

2021 年 8 月 17 日

Semi-Autoregressive Transformer for Image Captioning

翻译：用于图像说明的半自动递减变换器

Yuanen Zhou,Yong Zhang,Zhenzhen Hu,Meng Wang

from arxiv, Revised

Current state-of-the-art image captioning models adopt autoregressive decoders, \ie they generate each word by conditioning on previously generated words, which leads to heavy latency during inference. To tackle this issue, non-autoregressive image captioning models have recently been proposed to significantly accelerate the speed of inference by generating all words in parallel. However, these non-autoregressive models inevitably suffer from large generation quality degradation since they remove words dependence excessively. To make a better trade-off between speed and quality, we introduce a semi-autoregressive model for image captioning~(dubbed as SATIC), which keeps the autoregressive property in global but generates words parallelly in local . Based on Transformer, there are only a few modifications needed to implement SATIC. Experimental results on the MSCOCO image captioning benchmark show that SATIC can achieve a good trade-off without bells and whistles. Code is available at {\color{magenta}\url{https://github.com/YuanEZhou/satic}}.

翻译：目前最先进的图像字幕模型采用自动递减解码器, \ 它们以先前生成的单词为条件生成每个单词, 从而导致在推断过程中出现严重延迟。为了解决这一问题, 最近提出了非自动递减式图像字幕模型, 以通过平行生成所有单词来大大加快推论速度。但是, 这些非递减型模型不可避免地会因代代代相传的质量大幅退化, 因为它们消除了对单词的过度依赖性。为了在速度和质量之间实现更好的平衡, 我们引入了一个图像字幕( 以 SATIC ) 的半自动递增模式, 将自动递增属性保留在全球, 并同时生成本地的单词。基于变换器, 只需要做几处修改即可实施 SAPTIC 。 MOCO 图像字幕基准的实验结果显示, SATIC 可以实现良好的交易, 没有钟和哨子。代码可在 {colora{ murl{https://github.com/ YuanEzou/stical_ 。

1

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

41+阅读 · 2021年7月17日

一文概览 CVPR2021 最新18篇 Oral 论文

专知会员服务

25+阅读 · 2021年3月7日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

45+阅读 · 2020年7月29日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

19+阅读 · 2020年6月23日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

10+阅读 · 2020年4月27日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

75+阅读 · 2020年4月10日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

51+阅读 · 2020年3月3日

【课程】《终身学习、可解释ML、异常检测、对抗攻击》一览讲解，台大李宏毅老师2019机器学习课程讲义PPT

【课程】《终身学习、可解释ML、异常检测、对抗攻击》一览讲解，台大李宏毅老师2019机器学习课程讲义PPT

专知会员服务

82+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

30+阅读 · 2019年10月17日

CVPR 2019 | 重磅！34篇 CVPR2019 论文实现代码

CVPR 2019 | 重磅！34篇 CVPR2019 论文实现代码

AI研习社

11+阅读 · 2019年6月21日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

89+阅读 · 2018年10月23日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Semi-Autoregressive Image Captioning

Arxiv

0+阅读 · 2021年10月13日

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Arxiv

3+阅读 · 2021年6月11日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Image Captioning: Transforming Objects into Words

Image Captioning: Transforming Objects into Words

Arxiv

7+阅读 · 2019年6月14日

Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning

Arxiv

3+阅读 · 2019年6月11日

Progressive Pose Attention Transfer for Person Image Generation

Progressive Pose Attention Transfer for Person Image Generation

Arxiv

5+阅读 · 2019年4月9日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

14+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

VIP会员

文章信息

相关主题

相关VIP内容

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

41+阅读 · 2021年7月17日

一文概览 CVPR2021 最新18篇 Oral 论文

专知会员服务

25+阅读 · 2021年3月7日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

45+阅读 · 2020年7月29日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

19+阅读 · 2020年6月23日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

10+阅读 · 2020年4月27日

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

【CVPR2020-中科院计算所】弱监督语义分割的自监督等价注意力机制，Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation

专知会员服务

75+阅读 · 2020年4月10日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

51+阅读 · 2020年3月3日

【课程】《终身学习、可解释ML、异常检测、对抗攻击》一览讲解，台大李宏毅老师2019机器学习课程讲义PPT

【课程】《终身学习、可解释ML、异常检测、对抗攻击》一览讲解，台大李宏毅老师2019机器学习课程讲义PPT

专知会员服务

82+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

30+阅读 · 2019年10月17日

热门VIP内容

相关资讯

CVPR 2019 | 重磅！34篇 CVPR2019 论文实现代码

CVPR 2019 | 重磅！34篇 CVPR2019 论文实现代码

AI研习社

11+阅读 · 2019年6月21日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

89+阅读 · 2018年10月23日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

10+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Semi-Autoregressive Image Captioning

Arxiv

0+阅读 · 2021年10月13日

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

Arxiv

3+阅读 · 2021年6月11日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Image Captioning: Transforming Objects into Words

Image Captioning: Transforming Objects into Words

Arxiv

7+阅读 · 2019年6月14日

Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning

Arxiv

3+阅读 · 2019年6月11日

Progressive Pose Attention Transfer for Person Image Generation

Progressive Pose Attention Transfer for Person Image Generation

Arxiv

5+阅读 · 2019年4月9日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

14+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

微信扫码咨询专知VIP会员