用于图像描述的梯级注意 (Gated Hierarchical Attention for Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · 门控 · 注意力机制 · 解码 · Extensibility ·

2018 年 10 月 31 日

Gated Hierarchical Attention for Image Captioning

翻译：用于图像描述的梯级注意

Qingzhong Wang,Antoni B. Chan

from arxiv, Accepted by ACCV

Attention modules connecting encoder and decoders have been widely applied in the field of object recognition, image captioning, visual question answering and neural machine translation, and significantly improves the performance. In this paper, we propose a bottom-up gated hierarchical attention (GHA) mechanism for image captioning. Our proposed model employs a CNN as the decoder which is able to learn different concepts at different layers, and apparently, different concepts correspond to different areas of an image. Therefore, we develop the GHA in which low-level concepts are merged into high-level concepts and simultaneously low-level attended features pass to the top to make predictions. Our GHA significantly improves the performance of the model that only applies one level attention, for example, the CIDEr score increases from 0.923 to 0.999, which is comparable to the state-of-the-art models that employ attributes boosting and reinforcement learning (RL). We also conduct extensive experiments to analyze the CNN decoder and our proposed GHA, and we find that deeper decoders cannot obtain better performance, and when the convolutional decoder becomes deeper the model is likely to collapse during training.

翻译：将编码器和解码器连接起来的注意模块被广泛应用于物体识别、图像字幕、视觉问题解答和神经机翻译领域,并大大改进了性能。在本文中,我们建议对图像字幕采用自下而上的分层注意机制(GHA)。我们提议的模型使用CNN作为解码器,能够在不同层次学习不同概念,而且显然不同概念与不同图像领域相对应。因此,我们开发了GHA,其中将低级别概念合并为高层次概念,同时将低级别参与特征传送到顶层以作出预测。我们的GHA显著改进了只应用一个层次注意的模型的性能,例如CIDER分数从0.923升至0.99,这与利用属性增强和强化学习的状态(RL)的艺术模型相近。我们还进行了广泛的实验,以分析CNN解码器和我们提议的GHA,我们发现更深层次的解码器无法取得更好的性能,而在更深层次的分解码器在培训期间有可能崩溃。

2

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

276+阅读 · 2020年3月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

49+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

270+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文 | CVPR2017有哪些值得读的Image Caption论文？

论文 | CVPR2017有哪些值得读的Image Caption论文？

黑龙江大学自然语言处理实验室

16+阅读 · 2017年12月1日

CVPR2017有哪些值得读的Image Caption论文？

CVPR2017有哪些值得读的Image Caption论文？

PaperWeekly

10+阅读 · 2017年11月29日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Hierarchy Parsing for Image Captioning

Hierarchy Parsing for Image Captioning

Arxiv

6+阅读 · 2019年9月10日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

【Manning2020新书】深度强化学习实战，351页pdf，Deep Reinforcement Learning

专知会员服务

276+阅读 · 2020年3月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

30+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

49+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

167+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

270+阅读 · 2019年10月9日

热门VIP内容

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

论文 | CVPR2017有哪些值得读的Image Caption论文？

论文 | CVPR2017有哪些值得读的Image Caption论文？

黑龙江大学自然语言处理实验室

16+阅读 · 2017年12月1日

CVPR2017有哪些值得读的Image Caption论文？

CVPR2017有哪些值得读的Image Caption论文？

PaperWeekly

10+阅读 · 2017年11月29日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Hierarchy Parsing for Image Captioning

Hierarchy Parsing for Image Captioning

Arxiv

6+阅读 · 2019年9月10日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Image Captioning based on Deep Reinforcement Learning

Image Captioning based on Deep Reinforcement Learning

Arxiv

9+阅读 · 2018年9月13日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

微信扫码咨询专知VIP会员