关于随机分类人员对对抗性实例的可靠性 (On the robustness of randomized classifiers to adversarial examples)

This paper investigates the theory of robustness against adversarial attacks. We focus on randomized classifiers (\emph{i.e.} classifiers that output random variables) and provide a thorough analysis of their behavior through the lens of statistical learning theory and information theory. To this aim, we introduce a new notion of robustness for randomized classifiers, enforcing local Lipschitzness using probability metrics. Equipped with this definition, we make two new contributions. The first one consists in devising a new upper bound on the adversarial generalization gap of randomized classifiers. More precisely, we devise bounds on the generalization gap and the adversarial gap (\emph{i.e.} the gap between the risk and the worst-case risk under attack) of randomized classifiers. The second contribution presents a yet simple but efficient noise injection method to design robust randomized classifiers. We show that our results are applicable to a wide range of machine learning models under mild hypotheses. We further corroborate our findings with experimental results using deep neural networks on standard image datasets, namely CIFAR-10 and CIFAR-100. All robust models we trained models can simultaneously achieve state-of-the-art accuracy (over $0.82$ clean accuracy on CIFAR-10) and enjoy \emph{guaranteed} robust accuracy bounds ($0.45$ against $\ell_2$ adversaries with magnitude $0.5$ on CIFAR-10).

翻译：本文调查了对抗性攻击的稳健性理论。我们关注随机性分类器( emph{ i. e. } 分类器, 输出随机变量), 并通过统计学习理论和信息理论的透镜对其行为进行透彻分析。为此, 我们引入了随机性分类器的稳健性新概念, 使用概率度测量仪执行本地的Lipschitzness 。有了这个定义, 我们做出两项新的贡献。第一是设计一个新的上限, 以随机性分类器的对抗性总体性差距为新。更准确地说, 我们设计了宽度分类器的宽度差距和对抗性差距( emph{ i. e. ) 。随机性分类器的风险与最坏的风险和最坏的风险之间的鸿沟。第二是简单但有效的噪声注射方法, 来设计稳健的随机性分类器。我们展示了我们的成果适用于在轻度假设下的广泛机器学习模型。我们进一步用深层的神经性网络来证实我们的研究结果, 标准图像数据集, 即 CIRFAR- 10 和 CIFAR- 100 的直方。我们所训练的所有坚固型模型能够同时实现的准确性模型。