面对分布式转变的自然语言处理学习神经模型 (Learning Neural Models for Natural Language Processing in the Face of Distributional Shift)

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question answering or machine translation). However, it builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time. This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information. Moreover, it is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime. The first goal of this thesis is to characterize the different forms this shift can take in the context of natural language processing, and propose benchmarks and evaluation metrics to measure its effect on current deep learning architectures. We then proceed to take steps to mitigate the effect of distributional shift on NLP models. To this end, we develop methods based on parametric reformulations of the distributionally robust optimization framework. Empirically, we demonstrate that these approaches yield more robust models as demonstrated on a selection of realistic problems. In the third and final part of this thesis, we explore ways of efficiently adapting existing models to new domains or tasks. Our contribution to this topic takes inspiration from information geometry to derive a new gradient update rule which alleviate catastrophic forgetting issues during adaptation.

翻译：NLP模式是培训一个强大的神经预测器,以在特定数据集上完成一项任务,这种培训的主导性模式在培训中占据主导地位,在具体数据集上培训一个强大的神经预测器,这导致在各种应用(如情绪分类、基于频谱的问答或机器翻译)中出现最先进的性能;然而,它所依据的假设是,数据分布是静止的,即数据是在培训和测试时间从固定分布中抽样的,数据是在培训和测试时间从固定分布中采集的。这种培训方式与我们人类如何在不断变化的信息流中从分布变化中学习和运行不相符。此外,它不适应于现实世界使用数据分布预计将在模型生命周期中转变的状态。这一理论的首要目标是描述这种变化在自然语言处理中可以采取的不同形式,并提议基准和评价衡量数据对当前深层次学习结构的影响。我们接下来要采取步骤,减轻分配变化对NLP模型的影响。我们为此制定了基于对分布稳健的优化框架进行对等调整的方法。我们从现实性规则更新的第三个模型展示了我们当前选择的更稳健的模型。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日