Anomaly detection or outlier detection is a common task in various domains, which has attracted significant research efforts in recent years. Existing works mainly focus on structured data such as numerical or categorical data; however, anomaly detection on unstructured textual data is less attended. In this work, we target the textual anomaly detection problem and propose a deep anomaly-injected support vector data description (AI-SVDD) framework. AI-SVDD not only learns a more compact representation of the data hypersphere but also adopts a small number of known anomalies to increase the discriminative power. To tackle text input, we employ a multilayer perceptron (MLP) network in conjunction with BERT to obtain enriched text representations. We conduct experiments on three text anomaly detection applications with multiple datasets. Experimental results show that the proposed AI-SVDD is promising and outperforms existing works.
翻译:异常探测或异常探测是各个领域的共同任务,近年来吸引了大量研究工作,现有工作主要侧重于数字或绝对数据等结构化数据;然而,对非结构化文本数据的异常探测较少。在这项工作中,我们针对文本异常探测问题,并提出一个深度异常输入支持矢量数据描述(AI-SVDD)框架。AI-SVDD不仅学会了数据超视距的更为紧凑的表述,而且还采用少量已知的异常来增加歧视性力量。为了处理文本输入问题,我们与BERT一起使用多层受控器(MLP)网络,以获得丰富的文本表述。我们用多个数据集对三种文本异常检测应用进行实验。实验结果表明,拟议的AI-SVDD很有希望,并且比现有的工作更完美。