Word meaning changes over time, depending on linguistic and extra-linguistic factors. Associating a word's correct meaning in its historical context is a critical challenge in diachronic research, and is relevant to a range of NLP tasks, including information retrieval and semantic search in historical texts. Bayesian models for semantic change have emerged as a powerful tool to address this challenge, providing explicit and interpretable representations of semantic change phenomena. However, while corpora typically come with rich metadata, existing models are limited by their inability to exploit contextual information (such as text genre) beyond the document time-stamp. This is particularly critical in the case of ancient languages, where lack of data and long diachronic span make it harder to draw a clear distinction between polysemy and semantic change, and current systems perform poorly on these languages. We develop GASC, a dynamic semantic change model that leverages categorical metadata about the texts' genre information to boost inference and uncover the evolution of meanings in Ancient Greek corpora. In a new evaluation framework, we show that our model achieves improved predictive performance compared to the state of the art.
翻译:单词意味着随时间而变化, 取决于语言语言和语言因素。 将单词在历史背景下的正确含义与其历史背景联系起来,是地拉速研究中的一项关键挑战,并且与一系列国家语言方案的任务相关,包括历史文本的信息检索和语义搜索。 巴伊西亚语语语系变化模型已成为应对这一挑战的有力工具,为语义变化现象提供了清晰和可解释的表达方式。然而,虽然公司通常拥有丰富的元数据,但现有的模型由于无法在文件时间戳之外利用背景信息(如文本基因)而受到限制。对于古代语言来说,这尤其至关重要,因为在古代语言中,缺乏数据和长的地拉速跨度使得难以明确区分多种语言和语义变化,以及当前系统在这些语言上表现不佳。我们开发了一种动态语义变化模型,它利用关于文本的绝对元数据来增强推论力,揭示古希腊语囊中含义的演变过程。 在新的评估框架中,我们展示了我们模型能够改进到与艺术比较的状态。