DiT:文件图像变换器自监管的预培训 (DiT: Self-supervised Pre-training for Document Image Transformer) - 专知论文

会员服务 ·

0

变换 · 图片分类 · 光学字符识别 · DeiT · MoDELS ·

2022 年 4 月 12 日

DiT: Self-supervised Pre-training for Document Image Transformer

翻译：DiT:文件图像变换器自监管的预培训

Junlong Li,Yiheng Xu,Tengchao Lv,Lei Cui,Cha Zhang,Furu Wei

from arxiv, Work in Progress

Image Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose DiT, a self-supervised pre-trained Document Image Transformer model using large-scale unlabeled text images for Document AI tasks, which is essential since no supervised counterparts ever exist due to the lack of human labeled document images. We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for OCR. Experiment results have illustrated that the self-supervised pre-trained DiT model achieves new state-of-the-art results on these downstream tasks, e.g. document image classification (91.11 $\rightarrow$ 92.69), document layout analysis (91.0 $\rightarrow$ 94.9), table detection (94.23 $\rightarrow$ 96.55) and text detection for OCR (93.07 $\rightarrow$ 94.29). The code and pre-trained models are publicly available at \url{https://aka.ms/msdit}.

翻译：最近,图像变异器在了解自然图像方面取得了显著进展,或者使用了受监督(ViT、DeitT等)或自我监督(BEIT、MAE等)的训练前技术。在本文中,我们提议DiT,这是一个自我监督的经过事先培训的文档图像变异器模型,使用大规模无标签的文档AI任务文本图像,这是十分重要的,因为由于缺乏人类标记的文件图像,从未存在任何受监督的对应方。我们利用DiT作为基于视觉的文档AI任务的主干网,包括文件图像分类、文件布局分析、表格探测以及OCR的文本检测。实验结果表明,自监督的预先培训DIT模型在这些下游任务上取得了新的最新结果,例如文件图像分类(91.11 $\ rightrowrowr$ 92.69)、文件布局分析(91.0 $rightrowr$ 94.9)、表检测(94.23 $\rightrowr$ 96.55)和OCR的文本检测(93.07 $rightrowrmal$ 94.29) 。代码和预培训前模型可公开获得。

0

相关内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

85+阅读 · 2020年12月22日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

279+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

21+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

30+阅读 · 2019年10月16日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

已删除

将门创投

11+阅读 · 2019年7月4日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Dock3/Paks对癫痫突触可塑性的调控及异常神经网络形成机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

基于隐含关系的视觉显著学习方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于稀疏语义表示的大规模图像分类问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

全国计算力学自主软件学术研讨会

国家自然科学基金

0+阅读 · 2012年9月30日

基于Flux-Free上界估计的非线性力学有限元分析验证方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于HJ-1数据的作物成熟期遥感预测方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

重载齿轮箱复杂工况多源激励下复合故障耦合机理及诊断方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于人类视觉感知的高分辨率卫星遥感图像智能分类方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

加工番茄果茎两级分离机构设计及分离机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Self-supervised Learning for Sonar Image Classification

Arxiv

0+阅读 · 2022年4月20日

Pre-training of Deep Protein Models with Molecular Dynamics Simulations for Drug Binding

Arxiv

1+阅读 · 2022年4月19日

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Arxiv

0+阅读 · 2022年4月18日

BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation

Arxiv

0+阅读 · 2022年4月16日

Image Captioning In the Transformer Age

Arxiv

1+阅读 · 2022年4月15日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

VIP会员

文章信息

相关主题

光学字符识别

相关VIP内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

85+阅读 · 2020年12月22日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

279+阅读 · 2020年11月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

59+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

92+阅读 · 2020年3月12日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

21+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

57+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

30+阅读 · 2019年10月16日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

已删除

将门创投

11+阅读 · 2019年7月4日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

39+阅读 · 2019年6月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

19+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Self-supervised Learning for Sonar Image Classification

Arxiv

0+阅读 · 2022年4月20日

Pre-training of Deep Protein Models with Molecular Dynamics Simulations for Drug Binding

Arxiv

1+阅读 · 2022年4月19日

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Arxiv

0+阅读 · 2022年4月18日

BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation

Arxiv

0+阅读 · 2022年4月16日

Image Captioning In the Transformer Age

Arxiv

1+阅读 · 2022年4月15日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

相关基金

Dock3/Paks对癫痫突触可塑性的调控及异常神经网络形成机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

基于隐含关系的视觉显著学习方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于稀疏语义表示的大规模图像分类问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

全国计算力学自主软件学术研讨会

国家自然科学基金

0+阅读 · 2012年9月30日

基于Flux-Free上界估计的非线性力学有限元分析验证方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于HJ-1数据的作物成熟期遥感预测方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

重载齿轮箱复杂工况多源激励下复合故障耦合机理及诊断方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于人类视觉感知的高分辨率卫星遥感图像智能分类方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

加工番茄果茎两级分离机构设计及分离机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员