用于维基百科图像描述匹配的基于变换器的多模式建议和再兰克 (Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching) - 专知论文

会员服务 ·

0

维基百科 · Extensibility · CC · Kaggle · 推断 ·

2022 年 6 月 21 日

Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching

翻译：用于维基百科图像描述匹配的基于变换器的多模式建议和再兰克

Nicola Messina,Davide Alessandro Coccomini,Andrea Esuli,Fabrizio Falchi

from arxiv, Accepted for publication at the Wiki-M3L workshop, co-located with ICLR 2022

With the increased accessibility of web and online encyclopedias, the amount of data to manage is constantly increasing. In Wikipedia, for example, there are millions of pages written in multiple languages. These pages contain images that often lack the textual context, remaining conceptually floating and therefore harder to find and manage. In this work, we present the system we designed for participating in the Wikipedia Image-Caption Matching challenge on Kaggle, whose objective is to use data associated with images (URLs and visual data) to find the correct caption among a large pool of available ones. A system able to perform this task would improve the accessibility and completeness of multimedia content on large online encyclopedias. Specifically, we propose a cascade of two models, both powered by the recent Transformer model, able to efficiently and effectively infer a relevance score between the query image data and the captions. We verify through extensive experimentation that the proposed two-model approach is an effective way to handle a large pool of images and captions while maintaining bounded the overall computational complexity at inference time. Our approach achieves remarkable results, obtaining a normalized Discounted Cumulative Gain (nDCG) value of 0.53 on the private leaderboard of the Kaggle challenge.

翻译：随着网络和在线百科全书的可获取性不断提高,管理的数据数量正在不断增加。例如,在维基百科中,有数百万页以多种语言写成。这些页面的图像往往缺乏文字背景,在概念上仍然浮现,因此更难找到和管理。在这项工作中,我们展示了我们设计用于参与维基百科图像与卡格尔匹配挑战的维基百科图像-Caption匹配的系统,该系统的目标是利用与图像相关的数据(URLs和视觉数据)在大量可用图像库中找到正确的标题。一个能够完成这项任务的系统将改善大型在线百科全书多媒体内容的可获取性和完整性。具体地说,我们提出由最近的变换模型驱动的两套模式的系列模式,能够高效和有效地推导出查询图像数据与字幕之间的关联性分数。我们通过广泛的实验核实,拟议的两套模式方法是处理大量图像和字幕的有效方法,同时在推论时将总体计算复杂性捆绑在一起。我们的方法取得了显著的成果,即获取了正常的私人加固度挑战(nDC)。

0

相关内容

维基百科

维基百科（ http://Wikipedia.org）是一个基于 Wiki 技术的全球性多语言百科全书协作项目，同时也是一部在网际网络上呈现的网络百科全书网站，其目标及宗旨是为全人类提供自由的百科全书。目前 Alexa 全球网站排名第六。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

HIF-1/COMPASS调控缺氧诱导Brg1和Brm表达上调的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

外源性外泌体对mTOR通路参与柯萨奇病毒B3诱导细胞凋亡的调控作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

大数据环境下稀有类数据挖掘研究

国家自然科学基金

3+阅读 · 2015年12月31日

PSMA通过TRAF6和TTC3调控前列腺癌细胞自噬在CRPC产生过程中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞癌Gd-EOB-DTPA增强MR成像肝胆期信号的分子病理机制及对恶性程度与预后评估的研究

国家自然科学基金

0+阅读 · 2014年12月31日

DNA水凝胶-阳离子共轭聚合物杂化荧光纳米粒子的制备及其应用于肿瘤细胞成像与治疗研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

稀土上转换粒子/聚偶氮苯复合中空纳米粒子的合成、NIR光触发载药释放与荧光成像

国家自然科学基金

0+阅读 · 2013年12月31日

糖皮质激素抵抗介导的小胶质细胞过度激活在脑外伤后CIRCI发病中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

How and what to learn:The modes of machine learning

Arxiv

0+阅读 · 2022年8月8日

Improving Fuzzy-Logic based Map-Matching Method with Trajectory Stay-Point Detection

Arxiv

0+阅读 · 2022年8月4日

Reliability analysis of discrete-state performance functions via adaptive sequential sampling with detection of failure surfaces

Arxiv

0+阅读 · 2022年8月4日

Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos

Arxiv

0+阅读 · 2022年8月4日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

How and what to learn:The modes of machine learning

Arxiv

0+阅读 · 2022年8月8日

Improving Fuzzy-Logic based Map-Matching Method with Trajectory Stay-Point Detection

Arxiv

0+阅读 · 2022年8月4日

Reliability analysis of discrete-state performance functions via adaptive sequential sampling with detection of failure surfaces

Arxiv

0+阅读 · 2022年8月4日

Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos

Arxiv

0+阅读 · 2022年8月4日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

相关基金

HIF-1/COMPASS调控缺氧诱导Brg1和Brm表达上调的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

外源性外泌体对mTOR通路参与柯萨奇病毒B3诱导细胞凋亡的调控作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

大数据环境下稀有类数据挖掘研究

国家自然科学基金

3+阅读 · 2015年12月31日

PSMA通过TRAF6和TTC3调控前列腺癌细胞自噬在CRPC产生过程中的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞癌Gd-EOB-DTPA增强MR成像肝胆期信号的分子病理机制及对恶性程度与预后评估的研究

国家自然科学基金

0+阅读 · 2014年12月31日

DNA水凝胶-阳离子共轭聚合物杂化荧光纳米粒子的制备及其应用于肿瘤细胞成像与治疗研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土MOF纳米荧光探针的设计合成及其生物应用

国家自然科学基金

0+阅读 · 2013年12月31日

稀土上转换粒子/聚偶氮苯复合中空纳米粒子的合成、NIR光触发载药释放与荧光成像

国家自然科学基金

0+阅读 · 2013年12月31日

糖皮质激素抵抗介导的小胶质细胞过度激活在脑外伤后CIRCI发病中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员