Deep learning (DL) is being used extensively for text classification. However, researchers have demonstrated the vulnerability of such classifiers to adversarial attacks. Attackers modify the text in a way which misleads the classifier while keeping the original meaning close to intact. State-of-the-art (SOTA) attack algorithms follow the general principle of making minimal changes to the text so as to not jeopardize semantics. Taking advantage of this we propose a novel and intuitive defense strategy called Sample Shielding. It is attacker and classifier agnostic, does not require any reconfiguration of the classifier or external resources and is simple to implement. Essentially, we sample subsets of the input text, classify them and summarize these into a final decision. We shield three popular DL text classifiers with Sample Shielding, test their resilience against four SOTA attackers across three datasets in a realistic threat setting. Even when given the advantage of knowing about our shielding strategy the adversary's attack success rate is <=10% with only one exception and often < 5%. Additionally, Sample Shielding maintains near original accuracy when applied to original texts. Crucially, we show that the `make minimal changes' approach of SOTA attackers leads to critical vulnerabilities that can be defended against with an intuitive sampling strategy.
翻译:深度学习( DL) 正在被广泛用于文本分类 。 但是, 研究人员已经展示了这类分类者易受对抗性攻击的脆弱性 。 攻击者修改文本的方式可以误导分类者, 同时又使原始含义接近完整 。 最先进的( SOTA) 攻击算法遵循了对文本进行最小修改的一般原则, 以免危及语义学 。 利用这一方法, 我们提出了一个新颖和直观的防御战略, 称为“ 样本保护” 。 它是攻击者和分类者不可知的, 不需要对分类者或外部资源进行任何重组, 并且执行起来很简单 。 从根本上说, 我们抽样文本的样本组分、 分类和将它们归纳成最后决定 。 我们用样本盾牌保护了三种受欢迎的 DL 文本分类者, 在现实的威胁环境下测试他们对四个 SOTA 攻击者的抗御力。 即使了解我们的屏蔽战略, 敌人攻击成功率也只有10%, 只有一种例外, 并且常常 < 5 % 。 此外, 当对原始文本应用时,, 样本保护的原始精准性在原始文本中, 我们的取样方法中显示, 最起码的易变。