Natural Language Understanding (NLU) is a vital component of dialogue systems, and its ability to detect Out-of-Domain (OOD) inputs is critical in practical applications, since the acceptance of the OOD input that is unsupported by the current system may lead to catastrophic failure. However, most existing OOD detection methods rely heavily on manually labeled OOD samples and cannot take full advantage of unlabeled data. This limits the feasibility of these models in practical applications. In this paper, we propose a novel model to generate high-quality pseudo OOD samples that are akin to IN-Domain (IND) input utterances, and thereby improves the performance of OOD detection. To this end, an autoencoder is trained to map an input utterance into a latent code. and the codes of IND and OOD samples are trained to be indistinguishable by utilizing a generative adversarial network. To provide more supervision signals, an auxiliary classifier is introduced to regularize the generated OOD samples to have indistinguishable intent labels. Experiments show that these pseudo OOD samples generated by our model can be used to effectively improve OOD detection in NLU. Besides, we also demonstrate that the effectiveness of these pseudo OOD data can be further improved by efficiently utilizing unlabeled data.
翻译:自然语言理解(NLU)是对话系统的重要组成部分,它检测外部 OOD 投入的能力在实际应用中至关重要,因为接受OOD 不受当前系统支持的 OOD 输入可能会导致灾难性失败,然而,大多数现有的 OOD 检测方法严重依赖人工标签的OOOD 样本,无法充分利用未贴标签的数据。这限制了这些模型在实际应用中的可行性。在本文件中,我们提出了一个新型模型,以生成与IND(IND) 输入语句相近的高质量伪 OOD 样本,从而改进OOD 检测的性能。为此,对自动编码进行了培训,将输入的输入映射成潜在代码。而IND 和 OOD 样本的编码则经过培训,无法通过使用基因化对抗网络使这些模型无法分解。为了提供更多的监督信号,我们引入了辅助分类,以使生成的OOD 样本与IND (IND) 输入语句相似,从而改进OOD 检测的性能进一步证明我们有效地使用OD 数据。