走向端对端手写文件识别 (Towards End-to-end Handwritten Document Recognition) - 专知论文

会员服务 ·

0

端到端 · 文档识别 · state-of-the-art · 相互独立的 · 全卷积网络 ·

2022 年 9 月 30 日

Towards End-to-end Handwritten Document Recognition

翻译：走向端对端手写文件识别

from arxiv, Ph.D Thesis

Handwritten text recognition has been widely studied in the last decades for its numerous applications. Nowadays, the state-of-the-art approach consists in a three-step process. The document is segmented into text lines, which are then ordered and recognized. However, this three-step approach has many drawbacks. The three steps are treated independently whereas they are closely related. Errors accumulate from one step to the other. The ordering step is based on heuristic rules which prevent its use for documents with a complex layouts or for heterogeneous documents. The need for additional physical segmentation annotations for training the segmentation stage is inherent to this approach. In this thesis, we propose to tackle these issues by performing the handwritten text recognition of whole document in an end-to-end way. To this aim, we gradually increase the difficulty of the recognition task, moving from isolated lines to paragraphs, and then to whole documents. We proposed an approach at the line level, based on a fully convolutional network, in order to design a first generic feature extraction step for the handwriting recognition task. Based on this preliminary work, we studied two different approaches to recognize handwritten paragraphs. We reached state-of-the-art results at paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed the line-level state of the art on these datasets. We finally proposed the first end-to-end approach dedicated to the recognition of both text and layout, at document level. Characters and layout tokens are sequentially predicted following a learned reading order. We proposed two new metrics we used to evaluate this task on the RIMES 2009 and READ 2016 dataset, at page level and double-page level.

翻译：近几十年来,对大量应用程序的手动文本识别进行了广泛研究。如今, 最先进的文本识别方法由三步过程组成。此文档分为文字行, 然后进行排序和识别。但是, 三步方法有许多缺点。三步方法是独立的, 三个步骤是密切相关的。错误从一个步骤累积到另一个步骤。命令步骤是基于超常规则, 防止它用于具有复杂布局的文档或杂交文档。培训分解阶段需要额外的物理分解说明, 这是这一方法所固有的。在此结论中, 我们提议通过对整份文件进行手写文本识别, 以至最后命令方式进行分解。为了达到这一目的, 我们逐渐增加了识别任务的难度, 从孤立的行到段落, 然后到整个文件。我们建议了一条线级, 以完全进化的网络为基础, 为笔迹识别任务设计第一个通用的特征提取步骤。基于这一初步工作, 我们研究了两个不同的版本方法, 在最终的 RIM 和最后的 RIM 水平上, 我们用了两个直径方向, 我们用了两个直线级的 RIS 和最后的 RIS 格式级。

0

相关内容

端到端

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

仿生表面超疏液效应的流体动力学机理研究

国家自然科学基金

1+阅读 · 2015年12月31日

聚（3-羟基丁酸酯-co-3-羟基戊酸酯）基双结晶型多嵌段共聚物的结构和形态演变的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-TRA调控乳腺癌内分泌耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

PMMA材料表面高功率脉冲磁控溅射低温下制备ITO薄膜及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维流形上的Heegaard分解及其在纽结理论中应用

国家自然科学基金

0+阅读 · 2011年12月31日

多复变全纯函数空间及其空间上的复合算子

国家自然科学基金

0+阅读 · 2011年12月31日

固定翼航空瞬变电磁波的三维数值模拟

国家自然科学基金

0+阅读 · 2011年12月31日

脉冲高压霍尔等离子体推进器的研究

国家自然科学基金

0+阅读 · 2009年12月31日

乳腺癌表面标志CD147缺氧调控的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年11月4日

Handwritten Arabic Character Recognition for Children Writ-ing Using Convolutional Neural Network and Stroke Identification

Arxiv

0+阅读 · 2022年11月3日

CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking

CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking

Arxiv

0+阅读 · 2022年11月3日

Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise

Arxiv

0+阅读 · 2022年11月3日

Towards Zero-Shot Code-Switched Speech Recognition

Arxiv

0+阅读 · 2022年11月2日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

Deep Face Recognition: A Survey

Deep Face Recognition: A Survey

Arxiv

18+阅读 · 2019年2月12日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

VIP会员

文章信息

相关主题

state-of-the-art

相互独立的

全卷积网络

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据要素发展报告(2025年)：附下载

人工智能代理提升战时舰船战备水平

【NeurIPS2025教程】大语言模型规划

NeurIPS 2025 教程：深度学习训练不稳定性的理论洞见

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

相关论文

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年11月4日

Handwritten Arabic Character Recognition for Children Writ-ing Using Convolutional Neural Network and Stroke Identification

Arxiv

0+阅读 · 2022年11月3日

CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking

CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking

Arxiv

0+阅读 · 2022年11月3日

Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise

Arxiv

0+阅读 · 2022年11月3日

Towards Zero-Shot Code-Switched Speech Recognition

Arxiv

0+阅读 · 2022年11月2日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

Deep Face Recognition: A Survey

Deep Face Recognition: A Survey

Arxiv

18+阅读 · 2019年2月12日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

相关基金

仿生表面超疏液效应的流体动力学机理研究

国家自然科学基金

1+阅读 · 2015年12月31日

聚（3-羟基丁酸酯-co-3-羟基戊酸酯）基双结晶型多嵌段共聚物的结构和形态演变的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-TRA调控乳腺癌内分泌耐药的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

PMMA材料表面高功率脉冲磁控溅射低温下制备ITO薄膜及机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维流形上的Heegaard分解及其在纽结理论中应用

国家自然科学基金

0+阅读 · 2011年12月31日

多复变全纯函数空间及其空间上的复合算子

国家自然科学基金

0+阅读 · 2011年12月31日

固定翼航空瞬变电磁波的三维数值模拟

国家自然科学基金

0+阅读 · 2011年12月31日

脉冲高压霍尔等离子体推进器的研究

国家自然科学基金

0+阅读 · 2009年12月31日

乳腺癌表面标志CD147缺氧调控的机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员