RDU:基于区域的形式文件理解办法 (RDU: A Region-based Approach to Form-style Document Understanding)

Key Information Extraction (KIE) is aimed at extracting structured information (e.g. key-value pairs) from form-style documents (e.g. invoices), which makes an important step towards intelligent document understanding. Previous approaches generally tackle KIE by sequence tagging, which faces difficulty to process non-flatten sequences, especially for table-text mixed documents. These approaches also suffer from the trouble of pre-defining a fixed set of labels for each type of documents, as well as the label imbalance issue. In this work, we assume Optical Character Recognition (OCR) has been applied to input documents, and reformulate the KIE task as a region prediction problem in the two-dimensional (2D) space given a target field. Following this new setup, we develop a new KIE model named Region-based Document Understanding (RDU) that takes as input the text content and corresponding coordinates of a document, and tries to predict the result by localizing a bounding-box-like region. Our RDU first applies a layout-aware BERT equipped with a soft layout attention masking and bias mechanism to incorporate layout information into the representations. Then, a list of candidate regions is generated from the representations via a Region Proposal Module inspired by computer vision models widely applied for object detection. Finally, a Region Categorization Module and a Region Selection Module are adopted to judge whether a proposed region is valid and select the one with the largest probability from all proposed regions respectively. Experiments on four types of form-style documents show that our proposed method can achieve impressive results. In addition, our RDU model can be trained with different document types seamlessly, which is especially helpful over low-resource documents.

翻译：关键信息提取( KIE) 旨在从格式式文件( 如发票) 中提取结构化信息( 如关键值对), 从而向智能文件理解迈出重要的一步。以往的方法通常会通过序列标记处理 KIE, 难以处理不减缩的序列, 特别是表格文本混合文档。这些方法还因预先确定每类文件的固定标签以及标签不平衡问题而遇到困难。在这项工作中, 我们假定光学字符识别( 关键值对对) 已经应用到输入文件, 并且将 KIE 任务重新配置为二维(2D) 空间的区域预测问题。在这一新设置后, 我们将开发一个新的 KIEE 模型, 名为基于区域的文件理解( RDU), 用于输入文本内容和相应的文档协调, 并尝试通过对类似绑定框的区域进行预测结果。我们的 RDUE 首次应用一个配置的布局认知 BERT, 配置一个软布局化的模型和偏差化模型, 将 KEIE 任务作为区域中的一种区域预测模式, 将一个最有预感化的图像的模型, 最终显示区域。一个区域将一个选择区域, 将一个选择模型, 显示一个区域, 将一个选择区域, 将一个选择一个选择一个区域, 将一个选择一个选择一个区域将一个区域, 将一个区域, 将一个区域。