Phishing attacks are the most common type of cyber-attacks used to obtain sensitive information and have been affecting individuals as well as organisations across the globe. Various techniques have been proposed to identify the phishing attacks specifically, deployment of machine intelligence in recent years. However, the deployed algorithms and discriminating factors are very diverse in existing works. In this study, we present a comprehensive analysis of various machine learning algorithms to evaluate their performances over multiple datasets. We further investigate the most significant features within multiple datasets and compare the classification performance with the reduced dimensional datasets. The statistical results indicate that random forest and artificial neural network outperform other classification algorithms, achieving over 97% accuracy using the identified features.
翻译:模拟攻击是用来获取敏感信息的最常见的网络攻击类型,它一直影响着全球的个人和组织。提出了各种技术,以具体识别钓鱼攻击,近年来运用了机器情报。然而,在现有的工作中,部署的算法和区分因素多种多样。在本研究中,我们对各种机器学习算法进行了全面分析,以评价其在多个数据集上的性能。我们进一步调查了多个数据集中最重要的特征,并将分类性能与减少的立体数据集进行了比较。统计结果显示,随机森林和人工神经网络比其他分类算法要优于其他分类算法,利用所查明的特征实现了97%的准确性。