FinQA:财务数据数字理由数据集 (FinQA: A Dataset of Numerical Reasoning over Financial Data)

The sheer volume of financial statements makes it difficult for humans to access and analyze a business's financials. Robust numerical reasoning likewise faces unique challenges in this domain. In this work, we focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations. To facilitate analytical progress, we propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. We also annotate the gold reasoning programs to ensure full explainability. We further introduce baselines and conduct comprehensive experiments in our dataset. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge and in complex multi-step numerical reasoning on that knowledge. Our dataset -- the first of its kind -- should therefore enable significant, new community research into complex application domains. The dataset and code are publicly available\url{https://github.com/czyssrs/FinQA}.

翻译：大量的财务报表使得人类难以获取和分析企业的财务。强有力的数字推理同样也面临该领域的独特挑战。在这项工作中,我们侧重于回答金融数据方面的深刻问题,目的是对大量财务文件进行分析自动化。与一般领域的现有任务不同,金融领域包括复杂的数字推理和对各异表述的理解。为了便利分析进展,我们提议一个新的大型数据集,即FinQA, 由金融专家编写,在财务报告上配有问题解答配对。我们还注意到黄金推理程序,以确保充分解释。我们进一步在数据集中引入基线并进行全面实验。结果显示,在获取金融知识方面和在复杂的多步骤数字推理方面,受预先训练的大型模型远远落后于专家。因此,我们的数据集 -- -- 其首个类型 -- -- 应能对复杂的应用领域进行重要的、新的社区研究。数据集和代码是公开提供的:https://github.com/czysrs/FinQA}。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【USC2021】常识推理，47页ppt，Commonsense Reasoning in the Wild

专知会员服务

33+阅读 · 2021年10月9日

最新《并行编程》，599页pdf

专知会员服务

55+阅读 · 2021年7月21日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日