Pre-trained language models (LMs) have made significant advances in various Natural Language Processing (NLP) domains, but it is unclear to what extent they can infer formal semantics in ontologies, which are often used to represent conceptual knowledge and serve as the schema of data graphs. To investigate an LM's knowledge of ontologies, we propose OntoLAMA, a set of inference-based probing tasks and datasets from ontology subsumption axioms involving both atomic and complex concepts. We conduct extensive experiments on ontologies of different domains and scales, and our results demonstrate that LMs encode relatively less background knowledge of Subsumption Inference (SI) than traditional Natural Language Inference (NLI) but can improve on SI significantly when a small number of samples are given. We will open-source our code and datasets.
翻译:预先培训的语言模型(LMS)在各种自然语言处理(NLP)领域取得了显著进展,但尚不清楚这些模型在多大程度上能推断出在肿瘤学中的正式语义,这些理论常常用来代表概念知识,并用作数据图表的图象。为了调查LM对本体学的知识,我们提议OntoLAMA,这是一套基于推论的任务和由本体和复杂概念组成的本体子吸附轴组成的数据集。我们对不同领域和规模的本体学进行了广泛的实验,我们的结果表明,LMS比传统的自然语言推理(NLI)对子座假设(SI)的背景知识的编码相对较少,但在提供少量样品时,SI上可以显著改进。我们将公开我们的代码和数据集。