WIT:以维基百科为基础的多媒体多语种机器学习图像文本数据集 (WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning) - 专知论文

会员服务 ·

0

多峰值 · entity · 数据集 · 样例 · 学成 ·

2021 年 3 月 2 日

WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

翻译：WIT:以维基百科为基础的多媒体多语种机器学习图像文本数据集

Krishna Srinivasan,Karthik Raman,Jiecao Chen,Michael Bendersky,Marc Najork

The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage large high-quality visio-linguistic datasets for learning complementary information (across image and text modalities). In this paper, we introduce the Wikipedia-based Image Text (WIT) Dataset\footnote{\url{https://github.com/google-research-datasets/wit}} to better facilitate multimodal, multilingual learning. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal models, as we show when applied to downstream tasks such as image-text retrieval. WIT has four main and unique advantages. First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing). Second, WIT is massively multilingual (first of its kind) with coverage over 100+ languages (each of which has at least 12K examples) and provides cross-lingual texts for many images. Third, WIT represents a more diverse set of concepts and real world entities relative to what previous datasets cover. Lastly, WIT provides a very challenging real-world test set, as we empirically illustrate using an image-text retrieval task as an example.

翻译：深层代表性学习和培训前技术带来的里程碑式改进导致下游NLP、IR和Vision任务的大幅业绩收益。多模式模拟技术旨在利用高质量的高品质语言语言数据集学习补充信息(跨图像和文本模式)。在本文中,我们引入了基于维基百科的图像文本(WIT) Dataset\ footote=url{https://github.com/google-research-datasets/wit ⁇ ),以更好地促进多语言的多式联运学习。第二,WIT由一套3,760万个实体丰富的图像文本范例和108种维基百科语言的1,150万个独特图像集组成。它的规模使得WIT能够用作多模式模型的预培训数据集,正如我们在应用图像文本检索等下游任务时所显示的那样。WIT有四个主要和独特的优势。第一,WIT是最大的多式联运数据集,由3x的图像文本数目组成(在撰写时)。第二,WIT是一个庞大的多语种多语言实例,覆盖范围超过1,100+维基维基语言的图像的版本,在实际版本中提供最起码的版本。

0

相关内容

多峰值

2020数据工程师成长路线图

专知会员服务

38+阅读 · 2020年9月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

15+阅读 · 2020年4月28日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

31+阅读 · 2020年4月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

157+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

42+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Arxiv

5+阅读 · 2021年4月22日

Subsentence Extraction from Text Using Coverage-Based Deep Learning Language Models

Arxiv

0+阅读 · 2021年4月20日

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

Learning by Abstraction: The Neural State Machine

Learning by Abstraction: The Neural State Machine

Arxiv

6+阅读 · 2019年7月11日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年7月14日

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Arxiv

5+阅读 · 2018年5月18日

ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

Arxiv

3+阅读 · 2018年4月20日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

VIP会员

文章信息

相关主题

相关VIP内容

2020数据工程师成长路线图

专知会员服务

38+阅读 · 2020年9月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

95+阅读 · 2020年5月31日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

15+阅读 · 2020年4月28日

【google】监督对比学习，Supervised Contrastive Learning

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

31+阅读 · 2020年4月23日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

157+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

42+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

144+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

热门VIP内容

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Arxiv

5+阅读 · 2021年4月22日

Subsentence Extraction from Text Using Coverage-Based Deep Learning Language Models

Arxiv

0+阅读 · 2021年4月20日

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

Learning by Abstraction: The Neural State Machine

Learning by Abstraction: The Neural State Machine

Arxiv

6+阅读 · 2019年7月11日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年7月14日

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Arxiv

5+阅读 · 2018年5月18日

ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

Arxiv

3+阅读 · 2018年4月20日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

Multilingual Topic Models

Arxiv

3+阅读 · 2017年12月18日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

微信扫码咨询专知VIP会员