通过预测辛克霍特迭代的瓦塞斯泰因·阿德versarial实例 (Wasserstein Adversarial Examples via Projected Sinkhorn Iterations)

A rapidly growing area of work has studied the existence of adversarial examples, datapoints which have been perturbed to fool a classifier, but the vast majority of these works have focused primarily on threat models defined by $\ell_p$ norm-bounded perturbations. In this paper, we propose a new threat model for adversarial attacks based on the Wasserstein distance. In the image classification setting, such distances measure the cost of moving pixel mass, which naturally cover "standard" image manipulations such as scaling, rotation, translation, and distortion (and can potentially be applied to other settings as well). To generate Wasserstein adversarial examples, we develop a procedure for projecting onto the Wasserstein ball, based upon a modified version of the Sinkhorn iteration. The resulting algorithm can successfully attack image classification models, bringing traditional CIFAR10 models down to 3% accuracy within a Wasserstein ball with radius 0.1 (i.e., moving 10% of the image mass 1 pixel), and we demonstrate that PGD-based adversarial training can improve this adversarial accuracy to 76%. In total, this work opens up a new direction of study in adversarial robustness, more formally considering convex metrics that accurately capture the invariances that we typically believe should exist in classifiers. Code for all experiments in the paper is available at https://github.com/locuslab/projected_sinkhorn.

翻译：快速增长的工作领域已经研究了对抗性实例的存在,即数据点,这些数据点被打乱以愚弄一个分类者,但绝大多数这些作品主要侧重于由美元=ell_p$$美元规范限制的扰动确定的威胁模型。在本文中,我们提出了基于瓦塞斯坦距离的对抗性攻击的新的威胁模型。在图像分类设置中,这种距离测量移动像素质量的成本,自然覆盖“标准”图像操纵,如缩放、轮换、翻译和扭曲(并有可能适用于其他设置)。为了生成瓦塞尔斯坦对抗性实例,我们根据修改版的Sinkhorn 迭代法,制定了投射到瓦塞斯坦球上的威胁模型的程序。由此产生的算法可以成功地攻击图像分类模型,将传统的CIFAR10模型降低到0.1(即将图像质量1像素的10%移动到10%)的瓦塞斯坦球的准确性成本,并且我们证明基于PGD的对抗性培训可以将这种对抗性准确性精确度提高到76 %。总体而言,我们通常在研究中会开启一个新的方向,在研究中,在正争标准学中,我们应该将所有的Ritalalislexalisalislex 。