视觉 -- Rich 文件中的关系代表性学习 (Relational Representation Learning in Visually-Rich Documents)

Relational understanding is critical for a number of visually-rich documents (VRDs) understanding tasks. Through multi-modal pre-training, recent studies provide comprehensive contextual representations and exploit them as prior knowledge for downstream tasks. In spite of their impressive results, we observe that the widespread relational hints (e.g., relation of key/value fields on receipts) built upon contextual knowledge are not excavated yet. To mitigate this gap, we propose DocReL, a Document Relational Representation Learning framework. The major challenge of DocReL roots in the variety of relations. From the simplest pairwise relation to the complex global structure, it is infeasible to conduct supervised training due to the definition of relation varies and even conflicts in different tasks. To deal with the unpredictable definition of relations, we propose a novel contrastive learning task named Relational Consistency Modeling (RCM), which harnesses the fact that existing relations should be consistent in differently augmented positive views. RCM provides relational representations which are more compatible to the urgent need of downstream tasks, even without any knowledge about the exact definition of relation. DocReL achieves better performance on a wide variety of VRD relational understanding tasks, including table structure recognition, key information extraction and reading order detection.

翻译：通过多式培训前,最近的研究提供了全面的背景介绍,并把它们作为下游任务先前的知识加以利用。尽管取得了令人印象深刻的成果,但我们注意到,基于背景知识的广泛关系提示(例如关键/价值领域在收入上的关系)尚未挖掘出来。为了缩小这一差距,我们提议文件关系(DocReL),即文件关系代表学习框架。文件关系差异的根源在于各种关系。从与复杂的全球结构最简单的对应关系来看,由于对关系的定义不同,因此无法进行监督的培训,甚至无法在不同任务中进行冲突。为了处理不可预测的关系定义,我们提议了一项名为 " 关系关联性模型 " (RCM)的新的对比学习任务,它利用了以下事实,即现有关系应当以不同的扩大的积极观点保持一致。RCM提供与下游任务的紧迫需要更相容的关系表述,即使对关系的确切定义一无所知,也不可能进行监督性的培训。DocReL对各种关系结构的了解,包括解读系统的关键理解,从而更好地了解各种排序。