Discrete adversarial attacks are symbolic perturbations to a language input that preserve the output label but lead to a prediction error. While such attacks have been extensively explored for the purpose of evaluating model robustness, their utility for improving robustness has been limited to offline augmentation only. Concretely, given a trained model, attacks are used to generate perturbed (adversarial) examples, and the model is re-trained exactly once. In this work, we address this gap and leverage discrete attacks for online augmentation, where adversarial examples are generated at every training step, adapting to the changing nature of the model. We propose (i) a new discrete attack, based on best-first search, and (ii) random sampling attacks that unlike prior work are not based on expensive search-based procedures. Surprisingly, we find that random sampling leads to impressive gains in robustness, outperforming the commonly-used offline augmentation, while leading to a speedup at training time of ~10x. Furthermore, online augmentation with search-based attacks justifies the higher training cost, significantly improving robustness on three datasets. Last, we show that our new attack substantially improves robustness compared to prior methods.
翻译:在这项工作中,我们填补了这一空白,并用离散攻击来进行在线扩增,每次培训步骤都生成了对抗性攻击实例,以适应模型的不断变化性质。我们提议(一) 以最佳第一搜索为基础,进行新的独立攻击,以及(二) 随机抽样攻击,而与以往的工作不同,这些攻击并非以昂贵的搜索程序为基础。令人惊讶的是,我们发现随机抽样攻击导致稳健性方面的巨大进展,超过了常用的离线扩增,同时导致培训时间加快到~10x。此外,网上搜索式攻击也证明了培训费用较高,大大改进了三个数据集的坚固性。最后,我们证明我们的新攻击与以往方法相比大大改进了稳健性。