后门可解释机器学习 (Backdooring Explainable Machine Learning)

Explainable machine learning holds great potential for analyzing and understanding learning-based systems. These methods can, however, be manipulated to present unfaithful explanations, giving rise to powerful and stealthy adversaries. In this paper, we demonstrate blinding attacks that can fully disguise an ongoing attack against the machine learning model. Similar to neural backdoors, we modify the model's prediction upon trigger presence but simultaneously also fool the provided explanation. This enables an adversary to hide the presence of the trigger or point the explanation to entirely different portions of the input, throwing a red herring. We analyze different manifestations of such attacks for different explanation types in the image domain, before we resume to conduct a red-herring attack against malware classification.

翻译：可解释的机器学习具有分析和理解学习系统的巨大潜力。但是,这些方法可以被操纵,以提出不忠的解释,从而产生强大和隐形的对手。在本文中,我们展示了能够完全掩盖对机器学习模式的持续攻击的盲目的攻击。和神经后门一样,我们在触发器出现时修改模型的预测,但同时也愚弄了所提供的解释。这使得对手能够隐藏触发器的存在,或将解释点辨完全不同的部分输入, 扔出一条红色的套索。我们在恢复对恶意软件分类的重新攻击之前, 分析图像域中不同类型不同攻击的不同表现形式。

相关内容

Machine Learning

关注 0

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日