Phishing and related cyber threats are becoming more varied and technologically advanced. Among these, email-based phishing remains the most dominant and persistent threat. These attacks exploit human vulnerabilities to disseminate malware or gain unauthorized access to sensitive information. Deep learning (DL) models, particularly transformer-based models, have significantly enhanced phishing mitigation through their contextual understanding of language. However, some recent threats, specifically Artificial Intelligence (AI)-generated phishing attacks, are reducing the overall system resilience of phishing detectors. In response, adversarial training has shown promise against AI-generated phishing threats. This study presents a hybrid approach that uses DistilBERT, a smaller, faster, and lighter version of the BERT transformer model for email classification. Robustness against text-based adversarial perturbations is reinforced using Fast Gradient Method (FGM) adversarial training. Furthermore, the framework integrates the LIME Explainable AI (XAI) technique to enhance the transparency of the DistilBERT architecture. The framework also uses the Flan-T5-small language model from Hugging Face to generate plain-language security narrative explanations for end-users. This combined approach ensures precise phishing classification while providing easily understandable justifications for the model's decisions.
翻译:钓鱼攻击及相关网络威胁正变得日益多样化和技术化。其中,基于电子邮件的钓鱼攻击仍是最主要且持续存在的威胁。此类攻击利用人类心理弱点传播恶意软件或非法获取敏感信息。深度学习模型,尤其是基于Transformer的模型,凭借其对语言的上下文理解能力,显著提升了钓鱼攻击的防御效果。然而,近期出现的某些威胁,特别是人工智能生成的钓鱼攻击,正在降低钓鱼检测系统的整体韧性。为应对此问题,对抗训练已显示出对抗AI生成钓鱼威胁的潜力。本研究提出一种混合方法,采用DistilBERT(一种更轻量、快速且精简的BERT Transformer变体)进行电子邮件分类。通过快速梯度法对抗训练增强了模型对文本对抗扰动的鲁棒性。此外,该框架集成了LIME可解释人工智能技术,以提升DistilBERT架构的透明度。同时,采用Hugging Face的Flan-T5-small语言模型生成面向终端用户的自然语言安全叙事解释。这种综合方法在确保精准钓鱼分类的同时,为模型决策提供了易于理解的依据。