Analytical reasoning is an essential and challenging task that requires a system to analyze a scenario involving a set of particular circumstances and perform reasoning over it to make conclusions. In this paper, we study the challenge of analytical reasoning of text and introduce a new dataset consisting of questions from the Law School Admission Test from 1991 to 2016. We analyze what knowledge understanding and reasoning abilities are required to do well on this task. Furthermore, to address this reasoning challenge, we design two different baselines: (1) a Transformer-based method which leverages the state-of-the-art pre-trained language models and (2) Analytical Reasoning Machine (ARM), a logical-level reasoning framework extracting symbolic knowledge (e.g, participants, facts, logical functions) to deduce legitimate solutions. In our experiments, we find that the Transformer-based models struggle to solve this task as their performance is close to random guess and ARM achieves better performance by leveraging symbolic knowledge and interpretable reasoning steps. Results show that both methods still lag far behind human performance, which leave further space for future research.
翻译:分析推理是一项重要而具有挑战性的任务,它要求有一个系统来分析涉及一系列特定情况的设想,并对它进行推理,从而得出结论。在本文件中,我们研究了对文本进行分析推理的挑战,并引入了一套由1991年至2016年法学院入学测试问题组成的新数据集。我们分析了需要什么样的知识理解和推理能力才能很好地完成这项任务。此外,为了应对这一推理挑战,我们设计了两个不同的基线:(1) 基于变换器的方法,利用了最先进的预先培训语言模型,(2) 分析推理机(ARM),一个逻辑层面的推理框架,提取象征性知识(例如参与者、事实、逻辑功能)来推理合理的解决方案。在我们的实验中,我们发现以变换器为基础的模型努力解决这项任务,因为它们的性能接近于随机猜测,通过利用象征性知识和可解释的推理步骤而取得更好的业绩。结果显示,这两种方法仍然远远落后于人类业绩,为未来研究留下更多空间。