ALADIN:为高效图像文本匹配和检索保留精细比对一致分 (ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval) - 专知论文

会员服务 ·

0

蒸馏 · 得分 · KNN · 变换 · Performer ·

2022 年 7 月 29 日

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

翻译：ALADIN:为高效图像文本匹配和检索保留精细比对一致分

Nicola Messina,Matteo Stefanini,Marcella Cornia,Lorenzo Baraldi,Fabrizio Falchi,Giuseppe Amato,Rita Cucchiara

from arxiv, CBMI 2022

Image-text matching is gaining a leading role among tasks involving the joint understanding of vision and language. In literature, this task is often used as a pre-training objective to forge architectures able to jointly deal with images and texts. Nonetheless, it has a direct downstream application: cross-modal retrieval, which consists in finding images related to a given query text or vice-versa. Solving this task is of critical importance in cross-modal search engines. Many recent methods proposed effective solutions to the image-text matching problem, mostly using recent large vision-language (VL) Transformer networks. However, these models are often computationally expensive, especially at inference time. This prevents their adoption in large-scale cross-modal retrieval scenarios, where results should be provided to the user almost instantaneously. In this paper, we propose to fill in the gap between effectiveness and efficiency by proposing an ALign And DIstill Network (ALADIN). ALADIN first produces high-effective scores by aligning at fine-grained level images and texts. Then, it learns a shared embedding space - where an efficient kNN search can be performed - by distilling the relevance scores obtained from the fine-grained alignments. We obtained remarkable results on MS-COCO, showing that our method can compete with state-of-the-art VL Transformers while being almost 90 times faster. The code for reproducing our results is available at https://github.com/mesnico/ALADIN.

翻译：图像- 文本匹配在涉及共同理解视觉和语言的任务中正在发挥带头作用。在文献中, 这项任务常常被用作培训前的目标, 以构建能够共同处理图像和文本的结构。然而, 它有一个直接的下游应用: 跨模式检索, 包括查找与特定查询文本或反之的图像。解决这项任务在跨模式搜索引擎中至关重要。许多最近的方法都提出了图像- 文本匹配问题的有效解决方案, 大多使用最近的大型视觉- 语言( VL) 变异器网络。然而, 这些模型往往在计算上昂贵, 特别是在推断时间。这阻碍了在大规模跨模式检索情景中采用这些模型, 其结果应该几乎瞬间提供给用户。在本文中, 我们提议通过提议一个 Align 和 DIstill 网络(ALADIN) 来填补效力和效率之间的空白。 ALADIN 首先是通过调整精细的图像和文本来产生高效益的分数。然后, 它从一个共享的嵌入空间中学习到一个高效的 kNNE 搜索, 能够进行大规模的跨模式的跨模式检索,, 将结果应用到几乎的 COMALL 格式搜索, 。我们的DNA- 正在通过正在展示的 ISO- 格式化的进行的 MADL 格式化格式化格式上我们的的排序的的的的的的。

0

相关内容

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

LncRNA-TC0101441抑制KiSS-1促进卵巢癌侵袭转移的作用及分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于神经网络的跨语言实体链指研究

国家自然科学基金

4+阅读 · 2015年12月31日

结核感染人群中IL-22+ T细胞亚群的免疫学特征及其TCR-CDR3谱型分析

国家自然科学基金

0+阅读 · 2013年12月31日

青海湖高寒湿地生态系统CO2、水汽和热通量耦合及通量组分研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国汉族人群尼古丁依赖的易感基因位点关联分析及易感基因功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

飞机GLARE层板结构空气耦合超声兰姆波成像检测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

SARI转录抑制机制及在急性髓细胞白血病发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

干湿条件下非饱和黏土动力特性与动力本构模型的试验研究及理论分析

国家自然科学基金

0+阅读 · 2011年12月31日

黄土高原天然林林地时空变化及其驱动力研究

国家自然科学基金

0+阅读 · 2009年12月31日

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Arxiv

0+阅读 · 2022年9月27日

Searching a High-Performance Feature Extractor for Text Recognition Network

Arxiv

0+阅读 · 2022年9月27日

A Contrastive Framework for Neural Text Generation

Arxiv

0+阅读 · 2022年9月26日

Deep Manifold Hashing: A Divide-and-Conquer Approach for Semi-Paired Unsupervised Cross-Modal Retrieval

Arxiv

0+阅读 · 2022年9月26日

Clustering-Based Representation Learning through Output Translation and Its Application to Remote--Sensing Images

Arxiv

0+阅读 · 2022年9月25日

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

Arxiv

0+阅读 · 2022年9月23日

Unsupervised Hashing with Semantic Concept Mining

Arxiv

0+阅读 · 2022年9月23日

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Arxiv

0+阅读 · 2022年9月23日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

Cross-Domain Image Matching with Deep Feature Maps

Arxiv

14+阅读 · 2018年4月6日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复合人工智能决策优势：面向军事行动的人类数字孪生智能体编队与群体建模》最新文献

中文版《整合蓝绿作战域：北约空陆一体化向多域作战演进》2025最新资料

演进中的空中力量指挥控制体系

《在轨空间目标多智能体检测的制导、导航与控制》195页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

Arxiv

0+阅读 · 2022年9月27日

Searching a High-Performance Feature Extractor for Text Recognition Network

Arxiv

0+阅读 · 2022年9月27日

A Contrastive Framework for Neural Text Generation

Arxiv

0+阅读 · 2022年9月26日

Deep Manifold Hashing: A Divide-and-Conquer Approach for Semi-Paired Unsupervised Cross-Modal Retrieval

Arxiv

0+阅读 · 2022年9月26日

Clustering-Based Representation Learning through Output Translation and Its Application to Remote--Sensing Images

Arxiv

0+阅读 · 2022年9月25日

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

Arxiv

0+阅读 · 2022年9月23日

Unsupervised Hashing with Semantic Concept Mining

Arxiv

0+阅读 · 2022年9月23日

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Arxiv

0+阅读 · 2022年9月23日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

Cross-Domain Image Matching with Deep Feature Maps

Arxiv

14+阅读 · 2018年4月6日

相关基金

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

LncRNA-TC0101441抑制KiSS-1促进卵巢癌侵袭转移的作用及分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于神经网络的跨语言实体链指研究

国家自然科学基金

4+阅读 · 2015年12月31日

结核感染人群中IL-22+ T细胞亚群的免疫学特征及其TCR-CDR3谱型分析

国家自然科学基金

0+阅读 · 2013年12月31日

青海湖高寒湿地生态系统CO2、水汽和热通量耦合及通量组分研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国汉族人群尼古丁依赖的易感基因位点关联分析及易感基因功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

飞机GLARE层板结构空气耦合超声兰姆波成像检测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

SARI转录抑制机制及在急性髓细胞白血病发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

干湿条件下非饱和黏土动力特性与动力本构模型的试验研究及理论分析

国家自然科学基金

0+阅读 · 2011年12月31日

黄土高原天然林林地时空变化及其驱动力研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员