We propose and investigate probabilistic guarantees for the adversarial robustness of classification algorithms. While traditional formal verification approaches for robustness are intractable and sampling-based approaches do not provide formal guarantees, our approach is able to efficiently certify a probabilistic relaxation of robustness. The key idea is to sample an $\epsilon$-net and invoke a local robustness oracle on the sample. Remarkably, the size of the sample needed to achieve probably approximately global robustness guarantees is independent of the input dimensionality, the number of classes, and the learning algorithm itself. Our approach can, therefore, be applied even to large neural networks that are beyond the scope of traditional formal verification. Experiments empirically confirm that it characterizes robustness better than state-of-the-art sampling-based approaches and scales better than formal methods.
翻译:我们提出并研究了分类算法对抗鲁棒性的概率保证方法。传统的鲁棒性形式化验证方法计算上不可行,而基于采样的方法无法提供形式化保证;我们的方法能够高效地认证鲁棒性的概率松弛形式。其核心思想是采样一个$\epsilon$-网,并在样本上调用局部鲁棒性预言机。值得注意的是,实现概率近似全局鲁棒性保证所需的样本规模与输入维度、类别数量及学习算法本身均无关。因此,我们的方法可应用于传统形式化验证无法处理的大规模神经网络。实验经验证表明,该方法比当前最先进的基于采样的方法更能准确表征鲁棒性,且比形式化方法具有更好的可扩展性。