Making sense of familiar yet new situations typically involves making generalizations about causal schemas, stories that help humans reason about event sequences. Reasoning about events includes identifying cause and effect relations shared across event instances, a process we refer to as causal schema induction. Statistical schema induction systems may leverage structural knowledge encoded in discourse or the causal graphs associated with event meaning, however resources to study such causal structure are few in number and limited in size. In this work, we investigate how to apply schema induction models to the task of knowledge discovery for enhanced search of English-language news texts. To tackle the problem of data scarcity, we present Torquestra, a manually curated dataset of text-graph-schema units integrating temporal, event, and causal structures. We benchmark our dataset on three knowledge discovery tasks, building and evaluating models for each. Results show that systems that harness causal structure are effective at identifying texts sharing similar causal meaning components rather than relying on lexical cues alone. We make our dataset and models available for research purposes.
翻译:通过对因果结构的归纳推理,人类能够对新的情境进行理解,并系统化地对事件序列进行归纳。事件序列的推理过程包括识别跨多个事件示例共享的因果关系,这一过程被称为因果推论。统计学的事件序列归纳系统可以利用能够嵌入于语言文本中的结构化知识,也可以利用与事件含义关联的因果图。但这方面的资源数量有限。在这项研究中,我们探讨了如何将这些知识应用于英语新闻文本的知识发现任务中。为了解决数据稀缺性的问题,我们构建了手动 curated 的数据集 Torquestra,集成了时间、事件和因果结构。我们在三种知识发现任务中对数据集进行了基准测试,并建立了相应模型进行评估。结果表明,引入因果结构有助于识别因果意义组件相似的文本,而不是完全依赖于词汇线索。我们已将数据集和模型提供给有关研究人员使用。