With increasingly more sophisticated phishing campaigns in recent years, phishing emails lure people using more legitimate-looking personal contexts. To tackle this problem, instead of traditional heuristics-based algorithms, more adaptive detection systems such as natural language processing (NLP)-powered approaches are essential to understanding phishing text representations. Nevertheless, concerns surrounding the collection of phishing data that might cover confidential information hinder the effectiveness of model learning. We propose a decentralized phishing email detection framework called Federated Phish Bowl (FedPB) which facilitates collaborative phishing detection with privacy. In particular, we devise a knowledge-sharing mechanism with federated learning (FL). Using long short-term memory (LSTM) for phishing detection, the framework adapts by sharing a global word embedding matrix across the clients, with each client running its local model with Non-IID data. We collected the most recent phishing samples to study the effectiveness of the proposed method using different client numbers and data distributions. The results show that FedPB can attain a competitive performance with a centralized phishing detector, with generality to various cases of FL retaining a prediction accuracy of 83%.
翻译:近些年来,随着钓鱼运动的日益复杂,钓鱼电子邮件吸引人们使用更合法、更看似合法的个人环境。为了解决这一问题,而不是传统的超光速算法,更适应性的探测系统,例如自然语言处理(NLP)驱动的方法,对于理解钓鱼文字的表述方式至关重要。然而,围绕收集可能包含保密信息的网钓数据的关切妨碍了模型学习的有效性。我们提议了一个名为Feded Phish Bow(FedPB)的分散式网钓电子邮件检测框架,促进以隐私的方式合作钓鱼检测。特别是,我们用Federered 学习(FL)设计了一个知识共享机制。使用长期的短期内存(LSTM)进行钓鱼检测,该框架通过在客户之间共享一个全球词嵌入矩阵来适应,每个客户都用非IID数据运行其本地模型。我们收集了最新的网钓样本,以便利用不同的客户数量和数据分布来研究拟议方法的有效性。结果显示,FDPBD可以实现竞争性的绩效,以集中的phishor 83 准确性预测法,并用各种案例保存。