视觉学习者与网络图像-文本对等 (Vision Learners Meet Web Image-Text Pairs) - 专知论文

会员服务 ·

0

WEB · SSL · Vision · 学习器 · state-of-the-art ·

2023 年 1 月 17 日

Vision Learners Meet Web Image-Text Pairs

翻译：视觉学习者与网络图像-文本对等

Bingchen Zhao,Quan Cui,Hao Wu,Osamu Yoshie,Cheng Yang

from arxiv, Project page: https://bzhao.me/MUG/

Most recent self-supervised learning~(SSL) methods are pre-trained on the well-curated ImageNet-1K dataset. In this work, we consider SSL pre-training on noisy web image-text paired data due to the excellent scalability of web data. First, we conduct a benchmark study of representative SSL pre-training methods on large-scale web data in a fair condition. Methods include single-modal ones such as MAE and multi-modal ones such as CLIP. We observe that multi-modal methods cannot outperform single-modal ones on vision transfer learning tasks. We derive an information-theoretical view to explain the benchmarking results, which provides insights into designing novel vision learners. Inspired by the above explorations, we present a visual representation pre-training method, MUlti-modal Generator~(MUG), for scalable web image-text data. MUG achieves state-of-the-art transferring performances on a variety of tasks and shows promising scaling behavior. Models and codes will be made public. Demo available at https://huggingface.co/spaces/tennant/MUG_caption

翻译：最近自我监督的学习~(SSL)方法在完善的图像Net-1K数据集上已经预先培训。在这项工作中,我们考虑SSL对由于网络数据的可扩展性极强而吵闹的网络图像文本配对数据进行预先培训。首先,我们对具有代表性的SSL在公平条件下对大型网络数据进行初步培训的方法进行基准研究。方法包括单一模式方法,如MAE和多模式方法,如CLIP。我们观察到,多模式方法不能在视觉传输学习任务方面超越单一模式。我们从信息理论的角度来解释基准结果,为设计新的视觉学习者提供洞见。我受上述探索的启发,我们提出了一个视觉代表前培训方法,Multi-modal Gingry ~(MUG),用于可缩放的网络图像文本数据。MUG在各种任务上达到最先进的转让性能,并显示有希望的缩放行为。模型和代码将公布在 https://HOMA/MUG/spacepacefacefaces。

0

相关内容

WEB

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

161+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

金属碳化物基低铂介孔催化材料的合成、界面设计与电催化性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

复合材料里电磁问题的有限元方法

国家自然科学基金

1+阅读 · 2015年12月31日

新型KSi储氢合金的制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

氧化铁-Ag/AgX(X=Cl,Br,I)等离子体共振型多元复合光催化材料的构筑及其可见光催化性能和催化机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SPR效应的贵金属/电纺SiO2纳米管/层状SnS2复合材料的构建及其可见光催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

离子液体基功能化Janus纳米片的设计、制备与应用

国家自然科学基金

0+阅读 · 2012年12月31日

轻金属硼基氢化物复合材料的制备及储氢性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

HDPR1-δ-catenin通路在非小细胞肺癌侵袭和凋亡中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

钒氧化物-石墨烯纳米复合材料的制备及催化苯羟基化性能

国家自然科学基金

0+阅读 · 2012年12月31日

高效NO吸附性能的MOFs制备及低温SCR催化性能

国家自然科学基金

0+阅读 · 2011年12月31日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Linearly Mapping from Image to Text Space

Arxiv

0+阅读 · 2023年3月9日

Diffusion Models in Vision: A Survey

Arxiv

30+阅读 · 2022年9月10日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

Pose-Normalized Image Generation for Person Re-identification

Arxiv

11+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

161+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【EMNLP2025最佳论文】INFINI-GRAM MINI：基于 FM-Index 的互联网级精确 n-gram 搜索

【EMNLP2025教程】高效的大语言模型推理：算法、模型与系统，203页ppt

AI医疗行业研究报告：AI医疗前景广阔

【斯坦福博士论文】多模态基础模型：从科学理解到科学发现

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

相关论文

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Linearly Mapping from Image to Text Space

Arxiv

0+阅读 · 2023年3月9日

Diffusion Models in Vision: A Survey

Arxiv

30+阅读 · 2022年9月10日

Multi-Task Learning for Visual Scene Understanding

Arxiv

29+阅读 · 2022年3月28日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Adversarial Mutual Information for Text Generation

Adversarial Mutual Information for Text Generation

Arxiv

13+阅读 · 2020年6月30日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

Pose-Normalized Image Generation for Person Re-identification

Arxiv

11+阅读 · 2018年1月18日

相关基金

金属碳化物基低铂介孔催化材料的合成、界面设计与电催化性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

复合材料里电磁问题的有限元方法

国家自然科学基金

1+阅读 · 2015年12月31日

新型KSi储氢合金的制备及性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

氧化铁-Ag/AgX(X=Cl,Br,I)等离子体共振型多元复合光催化材料的构筑及其可见光催化性能和催化机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于SPR效应的贵金属/电纺SiO2纳米管/层状SnS2复合材料的构建及其可见光催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

离子液体基功能化Janus纳米片的设计、制备与应用

国家自然科学基金

0+阅读 · 2012年12月31日

轻金属硼基氢化物复合材料的制备及储氢性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

HDPR1-δ-catenin通路在非小细胞肺癌侵袭和凋亡中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

钒氧化物-石墨烯纳米复合材料的制备及催化苯羟基化性能

国家自然科学基金

0+阅读 · 2012年12月31日

高效NO吸附性能的MOFs制备及低温SCR催化性能

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员