While a great deal of work has been done on NLP approaches to lexical semantic change detection, other aspects of language change have received less attention from the NLP community. In this paper, we address the detection of sound change through historical spelling. We propose that a sound change can be captured by comparing the relative distance through time between their distributions using PPMI character embeddings. We verify this hypothesis in synthetic data and then test the method's ability to trace the well-known historical change of lenition of plosives in Danish historical sources. We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared. The methodology has the potential to contribute to the study of open questions such as the relative chronology of sound shifts and their geographical distribution.
翻译:虽然在用NLP方法探测语义变化方面做了大量工作,但语言变化的其他方面没有受到NLP社区的注意。在本文件中,我们通过历史拼写处理探测声音变化的问题。我们提议,通过使用PPMI特性嵌入的特征来比较分布之间的相对距离,可以捕捉到一个健全的变化。我们核实合成数据的这一假设,然后测试该方法追踪丹麦历史来源中独占感动物品的著名历史变化的能力。我们表明,这些模型能够查明审议中的一些变化,并发现它们出现的有意义的背景。该方法有可能有助于研究开放问题,例如音义变化及其地理分布的相对时间顺序。