解释使用 SHAP 由自动编码器检测到的异常 (Explaining Anomalies Detected by Autoencoders Using SHAP)

Anomaly detection algorithms are often thought to be limited because they don't facilitate the process of validating results performed by domain experts. In Contrast, deep learning algorithms for anomaly detection, such as autoencoders, point out the outliers, saving experts the time-consuming task of examining normal cases in order to find anomalies. Most outlier detection algorithms output a score for each instance in the database. The top-k most intense outliers are returned to the user for further inspection; however the manual validation of results becomes challenging without additional clues. An explanation of why an instance is anomalous enables the experts to focus their investigation on most important anomalies and may increase their trust in the algorithm. Recently, a game theory-based framework known as SHapley Additive exPlanations (SHAP) has been shown to be effective in explaining various supervised learning models. In this research, we extend SHAP to explain anomalies detected by an autoencoder, an unsupervised model. The proposed method extracts and visually depicts both the features that most contributed to the anomaly and those that offset it. A preliminary experimental study using real world data demonstrates the usefulness of the proposed method in assisting the domain experts to understand the anomaly and filtering out the uninteresting anomalies, aiming at minimizing the false positive rate of detected anomalies.

翻译：异常检测算法通常被认为有限, 因为它们不为域专家验证结果的进程提供便利。在 Contrast 中, 异常检测的深层次学习算法, 如自动解码器等, 指出导出器, 节省专家研究正常案例以发现异常现象的耗时任务。多数异常检测算法在数据库中为每个案例得出一个分数。最高级最强烈的外向器被送回用户作进一步检查; 但是, 人工验证结果会变得具有挑战性, 没有额外的线索。解释为什么一个实例是异常的, 使专家能够集中调查最重要的异常现象, 并可能增加他们对算法的信任。最近, 一个以游戏理论为基础的框架, 被称为 Shamapley Additive Explectations (SHAP), 被证明在解释各种受监督的学习模式方面是有效的。在这项研究中, 我们扩展 SHAPP, 来解释由自动解码器( 一种不精确的模型) 所检测到的异常现象。拟议的方法提取和直观描述出最有助于异常现象的特征, 和抵消这些异常现象的特征。初步实验性研究研究利用真实性分析法, 以了解真实性能能来了解真实性地分析结果,, 以了解真实性能, 显示真实性能的异常性。

相关内容

异常检测

关注 95

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

126+阅读 · 2020年5月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日