DeepSolo: 让具有明确点的变换器解码器</s> (DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting) - 专知论文

会员服务 ·

0

解码 · SimPLe · 变换 · Pivotal（公司） · state-of-the-art ·

2023 年 3 月 6 日

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

翻译：DeepSolo: 让具有明确点的变换器解码器

Maoyuan Ye,Jing Zhang,Shanshan Zhao,Juhua Liu,Tongliang Liu,Bo Du,Dacheng Tao

from arxiv, CVPR 2023

End-to-end text spotting aims to integrate scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. Although Transformer-based methods eliminate the heuristic post-processing, they still suffer from the synergy issue between the sub-tasks and low training efficiency. In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously. Technically, for each text instance, we represent the character sequence as ordered points and model them with learnable explicit point queries. After passing a single decoder, the point queries have encoded requisite text semantics and locations, thus can be further decoded to the center line, boundary, script, and confidence of text via very simple prediction heads in parallel. Besides, we also introduce a text-matching criterion to deliver more accurate supervisory signals, thus enabling more efficient training. Quantitative experiments on public benchmarks demonstrate that DeepSolo outperforms previous state-of-the-art methods and achieves better training efficiency. In addition, DeepSolo is also compatible with line annotations, which require much less annotation cost than polygons. The code is available at https://github.com/ViTAE-Transformer/DeepSolo.

翻译：端到端的文本检测旨在将现场文本检测和识别整合到一个统一的框架中。处理两个子任务之间的关系在设计有效的显示器方面发挥着关键作用。虽然基于变换器的方法消除了超常后处理, 但它们仍然受到子任务与低培训效率之间的协同问题的影响。在本文中, 我们介绍一个简单的 DeepSolo, 这个简单的 DTR 类似的基线, 使一个带有明确点的解码器同时进行文本检测和识别。从技术上讲, 我们代表两个子任务之间的关系, 在设计有效的显示两个子任务之间的关系。在通过一个解码器后, 点查询已经编码了所需的文本语义和位置, 因此可以通过非常简单的预测头平行地进一步解码到文字的中心线、边界、脚本和信心。此外, 我们还引入一个文本匹配标准, 以提供更准确的监督信号, 从而能够进行更有效的培训。公共基准的量化实验表明, DeepSolo 超越了先前的状态/ 明确点查询。在通过单一解码后, 点查询器的解算器将更不那么高的成本标准。, 需要一种可兼容性规则。</s>

0

相关内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

基于p-Cu2O帽层的增强型硅基AlGaN/GaN异质结场效晶体管的制备及可靠性研究

国家自然科学基金

0+阅读 · 2015年12月31日

热带西太平洋古海水pH重建模型的参数修正与区域标定

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

基于多齿膦配体的配位超分子的设计、组装与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于抽象解释的逻辑程序验证研究

国家自然科学基金

1+阅读 · 2008年12月31日

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Arxiv

0+阅读 · 2023年4月27日

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

Arxiv

0+阅读 · 2023年4月27日

Language Modelling with Pixels

Arxiv

0+阅读 · 2023年4月26日

Fully Sparse Fusion for 3D Object Detection

Arxiv

0+阅读 · 2023年4月25日

Semantic Compression With Large Language Models

Arxiv

0+阅读 · 2023年4月25日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

Pivotal（公司）

state-of-the-art

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

AI Agent、传统聊天机器人有何区别？如何评测？这篇30页综述讲明白了

【普林斯顿博士论文】迈向原则化的强化学习

基于多模态大模型的具身智能体研究进展与展望

CVPR2025 | ODE：多模态大语言模型幻觉的开集动态评估框架

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

$π$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Arxiv

0+阅读 · 2023年4月27日

Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

Arxiv

0+阅读 · 2023年4月27日

Language Modelling with Pixels

Arxiv

0+阅读 · 2023年4月26日

Fully Sparse Fusion for 3D Object Detection

Arxiv

0+阅读 · 2023年4月25日

Semantic Compression With Large Language Models

Arxiv

0+阅读 · 2023年4月25日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

基于p-Cu2O帽层的增强型硅基AlGaN/GaN异质结场效晶体管的制备及可靠性研究

国家自然科学基金

0+阅读 · 2015年12月31日

热带西太平洋古海水pH重建模型的参数修正与区域标定

国家自然科学基金

0+阅读 · 2013年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

Cocycle动力学和拟周期薛定谔算子的谱

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

Dirichlet空间的分析与几何

国家自然科学基金

0+阅读 · 2011年12月31日

基于多齿膦配体的配位超分子的设计、组装与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于抽象解释的逻辑程序验证研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员