用于图像字幕的回获取val- 增强的变换器 (Retrieval-Augmented Transformer for Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · 知识 (knowledge) · MoDELS · KNN · 外部记忆 ·

2022 年 7 月 26 日

Retrieval-Augmented Transformer for Image Captioning

翻译：用于图像字幕的回获取val- 增强的变换器

Sara Sarto,Marcella Cornia,Lorenzo Baraldi,Rita Cucchiara

from arxiv, CBMI 2022

Image captioning models aim at connecting Vision and Language by providing natural language descriptions of input images. In the past few years, the task has been tackled by learning parametric models and proposing visual feature extraction advancements or by modeling better multi-modal connections. In this paper, we investigate the development of an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process. Our architecture combines a knowledge retriever based on visual similarities, a differentiable encoder, and a kNN-augmented attention layer to predict tokens based on the past context and on text retrieved from the external memory. Experimental results, conducted on the COCO dataset, demonstrate that employing an explicit external memory can aid the generation process and increase caption quality. Our work opens up new avenues for improving image captioning models at larger scale.

翻译：图像字幕模型旨在通过提供输入图像的自然语言描述来连接视觉和语言。在过去几年中,任务是通过学习参数模型和提出视觉特征提取进步,或通过建模更好的多模式连接。在本文中,我们调查开发一个带有 kNN 内存的图像字幕方法,从外部外源中提取知识,以帮助生成过程。我们的建筑将基于视觉相似性的知识检索器、一个不同的编码器和一个 kNN 增强注意层结合起来,以预测基于过去背景的标志和从外部记忆中提取的文字。在COCOCO数据集上进行的实验结果表明,使用明确的外部记忆可以帮助生成过程并提高字幕质量。我们的工作开辟了在更大范围内改进图像字幕模型的新途径。

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

8+阅读 · 2022年3月19日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

2+阅读 · 2021年12月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

miR-204对人胚胎干细胞源性视网膜色素上皮细胞紧密连接的调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

HIV-1逆转录酶和整合酶双靶点抑制剂BPDKAs类似物的分子设计、合成及生物活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺陷态石墨烯负载CdS复合材料结构与分解水制氢性能的第一性原理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于离子液体功能基的大孔吸附树脂构效关系研究及多参数模型构建

国家自然科学基金

0+阅读 · 2013年12月31日

MR凋亡分子成像评估曲妥珠单抗靶向治疗HER2阳性乳腺癌疗效的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

序贯诱导重编程的自体多潜能干细胞分化为视网膜神经细胞

国家自然科学基金

0+阅读 · 2009年12月31日

血管活性肠肽对内毒素肺损伤启动的切断机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

紧密连接蛋白及其磷酸化修饰在铅诱导的神经毒性中的作用及其调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

Arxiv

0+阅读 · 2022年9月14日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

14+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

Image Captioning using Deep Neural Architectures

Arxiv

20+阅读 · 2018年1月17日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

11+阅读 · 2018年1月11日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

13+阅读 · 2017年12月21日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

8+阅读 · 2022年3月19日

NLP必读经典文献100篇

专知会员服务

123+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

76+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

99+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

52+阅读 · 2019年9月29日

热门VIP内容

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

2+阅读 · 2021年12月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

相关论文

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

Arxiv

0+阅读 · 2022年9月14日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

14+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

Image Captioning using Deep Neural Architectures

Arxiv

20+阅读 · 2018年1月17日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

11+阅读 · 2018年1月11日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

13+阅读 · 2017年12月21日

相关基金

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

miR-204对人胚胎干细胞源性视网膜色素上皮细胞紧密连接的调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

HIV-1逆转录酶和整合酶双靶点抑制剂BPDKAs类似物的分子设计、合成及生物活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

缺陷态石墨烯负载CdS复合材料结构与分解水制氢性能的第一性原理研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于离子液体功能基的大孔吸附树脂构效关系研究及多参数模型构建

国家自然科学基金

0+阅读 · 2013年12月31日

MR凋亡分子成像评估曲妥珠单抗靶向治疗HER2阳性乳腺癌疗效的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

深海放线菌Streptomyces sp. SCSIO 03032抗肿瘤天然产物Spiroindimicins生物合成研究

国家自然科学基金

0+阅读 · 2012年12月31日

序贯诱导重编程的自体多潜能干细胞分化为视网膜神经细胞

国家自然科学基金

0+阅读 · 2009年12月31日

血管活性肠肽对内毒素肺损伤启动的切断机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

紧密连接蛋白及其磷酸化修饰在铅诱导的神经毒性中的作用及其调控机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员