PIMNet:一个平行、迭代和模拟网络,以识别场景文字识别 (PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition) - 专知论文

会员服务 ·

0

解码 · 隐藏层 · Extensibility · 模型评估 · 注意力机制 ·

2021 年 9 月 9 日

PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition

翻译：PIMNet:一个平行、迭代和模拟网络,以识别场景文字识别

Zhi Qiao,Yu Zhou,Jin Wei,Wei Wang,Yuan Zhang,Ning Jiang,Hongbin Wang,Weiping Wang

from arxiv, Accepted by ACM MM 2021

Nowadays, scene text recognition has attracted more and more attention due to its various applications. Most state-of-the-art methods adopt an encoder-decoder framework with attention mechanism, which generates text autoregressively from left to right. Despite the convincing performance, the speed is limited because of the one-by-one decoding strategy. As opposed to autoregressive models, non-autoregressive models predict the results in parallel with a much shorter inference time, but the accuracy falls behind the autoregressive counterpart considerably. In this paper, we propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency. Specifically, PIMNet adopts a parallel attention mechanism to predict the text faster and an iterative generation mechanism to make the predictions more accurate. In each iteration, the context information is fully explored. To improve learning of the hidden layer, we exploit the mimicking learning in the training phase, where an additional autoregressive decoder is adopted and the parallel decoder mimics the autoregressive decoder with fitting outputs of the hidden layer. With the shared backbone between the two decoders, the proposed PIMNet can be trained end-to-end without pre-training. During inference, the branch of the autoregressive decoder is removed for a faster speed. Extensive experiments on public benchmarks demonstrate the effectiveness and efficiency of PIMNet. Our code will be available at https://github.com/Pay20Y/PIMNet.

翻译：目前,场景文本的识别因其各种应用而吸引了越来越多的关注。多数最先进的方法都采用了带有关注机制的编码器解码器框架( PIMNet ) 来平衡网络的准确性和效率。具体而言, PIMNet 采用了一个平行的注意机制来预测文本的速度更快, 并且一个迭代生成机制来使预测更加准确。在每次循环中,都会充分探索上下文信息。为了改进对隐性层的学习,我们利用在培训阶段的模拟学习,在培训阶段将采用更多的自动解析器,而平行的解析网络( PIM Net ) 将平衡准确性和效率。具体地说, PIM 使用一个平行的注意机制来预测文本更快, 迭代代生成机制来使预测更加准确。在每次循环中, 要充分探索背景信息。为了改进对隐性层的学习,我们将利用在培训阶段的模拟学习过程, 在那里会采用更多的自动解析式解析器, 平行的解析器将覆盖着我们隐藏的IM 。在隐藏的图层中, 将共享的底部的底部的底部, 。

1

相关内容

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Python数据分析:过去、现在和未来，52页ppt

Python数据分析:过去、现在和未来，52页ppt

专知会员服务

102+阅读 · 2020年3月9日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

德先生

53+阅读 · 2019年4月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

A Simple Generative Network

Arxiv

0+阅读 · 2021年10月31日

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Arxiv

4+阅读 · 2020年3月27日

Active Generative Adversarial Network for Image Classification

Arxiv

4+阅读 · 2019年6月17日

Local Relation Networks for Image Recognition

Local Relation Networks for Image Recognition

Arxiv

4+阅读 · 2019年4月25日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification

ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification

Arxiv

3+阅读 · 2018年12月14日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation

Arxiv

3+阅读 · 2017年8月2日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Python数据分析:过去、现在和未来，52页ppt

Python数据分析:过去、现在和未来，52页ppt

专知会员服务

102+阅读 · 2020年3月9日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACL2025教程】大语言模型的护栏与安全性：对其应用的安全、可靠与可控引导

《实现协同自主：从人机协作到多智能体系统》最新190页

【ICML2025】SToFM：一种用于空间转录组学的多尺度基础模型

通信网络智能体白皮书V1.0，61页pdf

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

德先生

53+阅读 · 2019年4月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

A Simple Generative Network

Arxiv

0+阅读 · 2021年10月31日

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

Arxiv

4+阅读 · 2020年3月27日

Active Generative Adversarial Network for Image Classification

Arxiv

4+阅读 · 2019年6月17日

Local Relation Networks for Image Recognition

Local Relation Networks for Image Recognition

Arxiv

4+阅读 · 2019年4月25日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification

ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification

Arxiv

3+阅读 · 2018年12月14日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation

Arxiv

3+阅读 · 2017年8月2日

微信扫码咨询专知VIP会员