DocParser: 文件提交文件的等级结构分析 (DocParser: Hierarchical Structure Parsing of Document Renderings) - 专知论文

会员服务 ·

0

Extensibility · entity · Principle · 查准率/准确率 · Performance ·

2021 年 1 月 25 日

DocParser: Hierarchical Structure Parsing of Document Renderings

翻译：DocParser: 文件提交文件的等级结构分析

Johannes Rausch,Octavio Martinez,Fabian Bissig,Ce Zhang,Stefan Feuerriegel

from arxiv, AAAI 2021

Translating renderings (e. g. PDFs, scans) into hierarchical document structures is extensively demanded in the daily routines of many real-world applications. However, a holistic, principled approach to inferring the complete hierarchical structure of documents is missing. As a remedy, we developed "DocParser": an end-to-end system for parsing the complete document structure - including all text elements, nested figures, tables, and table cell structures. Our second contribution is to provide a dataset for evaluating hierarchical document structure parsing. Our third contribution is to propose a scalable learning framework for settings where domain-specific data are scarce, which we address by a novel approach to weak supervision that significantly improves the document structure parsing performance. Our experiments confirm the effectiveness of our proposed weak supervision: Compared to the baseline without weak supervision, it improves the mean average precision for detecting document entities by 39.1 % and improves the F1 score of classifying hierarchical relations by 35.8 %.

翻译：在许多真实世界应用程序的日常日常工作中,大量要求将转换成等级文档结构(如PDFs、扫描等)。然而,缺少一种全面、有原则的办法来推断文件的完整等级结构。作为一种补救措施,我们开发了“DocParker”:一个端到端系统,用于分析完整的文档结构,包括所有文字元素、嵌套图、表格和表格单元格结构。我们的第二个贡献是为评价等级文档结构的剖析提供一个数据集。我们的第三个贡献是为缺少特定领域数据的设置提供一个可缩放的学习框架,我们通过新颖的对薄弱的监督方法加以解决,该方法大大改进了文件结构的剖析性。我们的实验证实了我们提议的薄弱监督的有效性:与基线相比,没有薄弱的监督,它提高了测出文件实体的平均精确度39.1%,将等级关系分类的F1分提高35.8%。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

【浙江大学】计算摄影学 (Computational Photography)课程

【浙江大学】计算摄影学 (Computational Photography)课程

专知会员服务

28+阅读 · 2020年12月26日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

《C++ Primer中文版第5版》电子书与学习笔记和课后练习答案

《C++ Primer中文版第5版》电子书与学习笔记和课后练习答案

专知会员服务

274+阅读 · 2020年2月13日

【论文】结构GANs，Structured GANs，

【论文】结构GANs，Structured GANs，

专知会员服务

15+阅读 · 2020年1月16日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

已删除

将门创投

7+阅读 · 2020年3月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

论文浅尝 | Zero-Shot Transfer Learning for Event Extraction

论文浅尝 | Zero-Shot Transfer Learning for Event Extraction

开放知识图谱

26+阅读 · 2018年11月1日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

Arxiv

4+阅读 · 2020年10月26日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Hierarchy Parsing for Image Captioning

Hierarchy Parsing for Image Captioning

Arxiv

6+阅读 · 2019年9月10日

End-to-End Learning for Answering Structured Queries Directly over Text

Arxiv

3+阅读 · 2018年11月16日

A Deep Structure of Person Re-Identification using Multi-Level Gaussian Models

Arxiv

3+阅读 · 2018年5月20日

Hierarchical Reinforcement Learning with Deep Nested Agents

Arxiv

3+阅读 · 2018年5月18日

Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering

Arxiv

5+阅读 · 2018年4月9日

Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences

Arxiv

10+阅读 · 2018年3月27日

Scene Graph Parsing as Dependency Parsing

Arxiv

6+阅读 · 2018年3月25日

Hierarchical Label Inference for Video Classification

Arxiv

6+阅读 · 2018年1月21日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

【浙江大学】计算摄影学 (Computational Photography)课程

【浙江大学】计算摄影学 (Computational Photography)课程

专知会员服务

28+阅读 · 2020年12月26日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

《C++ Primer中文版第5版》电子书与学习笔记和课后练习答案

《C++ Primer中文版第5版》电子书与学习笔记和课后练习答案

专知会员服务

274+阅读 · 2020年2月13日

【论文】结构GANs，Structured GANs，

【论文】结构GANs，Structured GANs，

专知会员服务

15+阅读 · 2020年1月16日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

AI Agent、传统聊天机器人有何区别？如何评测？这篇30页综述讲明白了

【普林斯顿博士论文】迈向原则化的强化学习

基于多模态大模型的具身智能体研究进展与展望

CVPR2025 | ODE：多模态大语言模型幻觉的开集动态评估框架

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

已删除

将门创投

7+阅读 · 2020年3月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

论文浅尝 | Zero-Shot Transfer Learning for Event Extraction

论文浅尝 | Zero-Shot Transfer Learning for Event Extraction

开放知识图谱

26+阅读 · 2018年11月1日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Hierarchical Metadata-Aware Document Categorization under Weak Supervision

Arxiv

4+阅读 · 2020年10月26日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

Hierarchy Parsing for Image Captioning

Hierarchy Parsing for Image Captioning

Arxiv

6+阅读 · 2019年9月10日

End-to-End Learning for Answering Structured Queries Directly over Text

Arxiv

3+阅读 · 2018年11月16日

A Deep Structure of Person Re-Identification using Multi-Level Gaussian Models

Arxiv

3+阅读 · 2018年5月20日

Hierarchical Reinforcement Learning with Deep Nested Agents

Arxiv

3+阅读 · 2018年5月18日

Learning to Rank Question-Answer Pairs using Hierarchical Recurrent Encoder with Latent Topic Clustering

Arxiv

5+阅读 · 2018年4月9日

Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences

Arxiv

10+阅读 · 2018年3月27日

Scene Graph Parsing as Dependency Parsing

Arxiv

6+阅读 · 2018年3月25日

Hierarchical Label Inference for Video Classification

Arxiv

6+阅读 · 2018年1月21日

微信扫码咨询专知VIP会员