Defenses against security threats have been an interest of recent studies. Recent works have shown that it is not difficult to attack a natural language processing (NLP) model while defending against them is still a cat-mouse game. Backdoor attacks are one such attack where a neural network is made to perform in a certain way on specific samples containing some triggers while achieving normal results on other samples. In this work, we present a few defense strategies that can be useful to counter against such an attack. We show that our defense methodologies significantly decrease the performance on the attacked inputs while maintaining similar performance on benign inputs. We also show that some of our defenses have very less runtime and also maintain similarity with the original inputs.
翻译:防范安全威胁是最近研究的一项内容。最近的工作表明,攻击自然语言处理模式(NLP)而保护自然语言处理模式(NLP)并非难事,而保护自然语言处理模式(NLP)仍然是猫咪游戏。后门攻击就是这种攻击,因为神经网络以某种方式对含有某些触发因素的特定样本进行攻击,同时对其他样本取得正常结果。在这项工作中,我们提出了一些防御战略,可以用来对付这种攻击。我们表明,我们的防御方法大大减少了被攻击投入的性能,同时保持了对良性投入的类似性能。我们还表明,我们的一些防御系统运行时间非常短,也保持了与原始投入的相似性能。