Our world is open-ended, non-stationary and constantly evolving; thus what we talk about and how we talk about it changes over time. This inherent dynamic nature of language comes in stark contrast to the current static language modelling paradigm, which constructs training and evaluation sets from overlapping time periods. Despite recent progress, we demonstrate that state-of-the-art Transformer models perform worse in the realistic setup of predicting future utterances from beyond their training period -- a consistent pattern across three datasets from two domains. We find that, while increasing model size alone -- a key driver behind recent progress -- does not provide a solution for the temporal generalization problem, having models that continually update their knowledge with new information can indeed slow down the degradation over time. Hence, given the compilation of ever-larger language modelling training datasets, combined with the growing list of language-model-based NLP applications that require up-to-date knowledge about the world, we argue that now is the right time to rethink our static language modelling evaluation protocol, and develop adaptive language models that can remain up-to-date with respect to our ever-changing and non-stationary world.
翻译:我们的世界是开放的、非静止的,并不断演变;因此,我们谈论的是什么以及我们谈论它的方式随着时间的推移而改变。语言的这种内在动态性质与目前静态的语言建模模式形成鲜明对比,这种建模模式从重叠的时期建立培训和评估体系。尽管最近取得了进展,但我们证明,在现实地从培训期以外预测未来言论的现实设置中,最先进的变异模型表现得更差 -- -- 这是两个领域三个数据集的一贯模式。我们发现,虽然仅仅增加模型规模 -- -- 近期进展背后的一个关键驱动因素 -- -- 并不能解决时间性普遍化问题,但拥有不断以新信息更新知识的模型确实可以减缓这种退化。 因此,鉴于不断扩大的语言建模培训数据集的汇编,加上基于语言建模的NLP应用程序不断增多,需要最新的世界知识,我们认为,现在是重新思考我们静态语言建模评估协议的合适时机,并发展适应性语言模型,能够跟上我们不断变化的和非静止世界。