Model stealing attack aims to create a substitute model that steals the ability of the victim target model. However, most of the existing methods depend on the full probability outputs from the victim model, which is unavailable in most realistic scenarios. Focusing on the more practical hard-label setting, due to the lack of rich information in the probability prediction, the existing methods suffer from catastrophic performance degradation. Inspired by knowledge distillation, we propose a novel hard-label model stealing method termed \emph{black-box dissector}, which includes a CAM-driven erasing strategy to mine the hidden information in hard labels from the victim model, and a random-erasing-based self-knowledge distillation module utilizing soft labels from substitute model to avoid overfitting and miscalibration caused by hard labels. Extensive experiments on four widely-used datasets consistently show that our method outperforms state-of-the-art methods, with an improvement of at most $9.92\%$. In addition, experiments on real-world APIs further prove the effectiveness of our method. Our method also can invalidate existing defense methods which further demonstrates the practical potential of our methods.
翻译:盗窃模型袭击的目的是建立一个替代模型,窃取受害者目标模型的能力。然而,大多数现有方法取决于受害者模型的全部概率产出,而这种结果在最现实的情景中是无法获得的。侧重于更实用的硬标签设置,因为概率预测缺乏丰富的信息,现有方法遭受灾难性性能退化的影响。在知识蒸馏的启发下,我们提出了一个新的硬标签盗窃模型方法,名为emph{black-box dission},其中包括由CAM驱动的去除战略,以从受害者模型的硬标签中挖掘隐藏的信息,以及一个随机的自我智能蒸馏模块,使用替代模型的软标签,以避免因硬标签造成的过度装配配和错配。关于四套广泛使用的数据集的广泛实验不断表明,我们的方法超越了最新技术的方法,最多改进了992美元。此外,现实世界的APIs实验进一步证明了我们的方法的有效性。我们的方法还可以使现有的防御方法失效,进一步展示我们方法的实际潜力。