自动软件脆弱性检测神经网络技术比较研究 (A comparative study of neural network techniques for automatic software vulnerability detection)

from arxiv, This paper has been published at April 28,2021. However, there are some experimental data issues in the published manuscript, which are caused by the calculation error of indicators. This paper is a revised version

Software vulnerabilities are usually caused by design flaws or implementation errors, which could be exploited to cause damage to the security of the system. At present, the most commonly used method for detecting software vulnerabilities is static analysis. Most of the related technologies work based on rules or code similarity (source code level) and rely on manually defined vulnerability features. However, these rules and vulnerability features are difficult to be defined and designed accurately, which makes static analysis face many challenges in practical applications. To alleviate this problem, some researchers have proposed to use neural networks that have the ability of automatic feature extraction to improve the intelligence of detection. However, there are many types of neural networks, and different data preprocessing methods will have a significant impact on model performance. It is a great challenge for engineers and researchers to choose a proper neural network and data preprocessing method for a given problem. To solve this problem, we have conducted extensive experiments to test the performance of the two most typical neural networks (i.e., Bi-LSTM and RVFL) with the two most classical data preprocessing methods (i.e., the vector representation and the program symbolization methods) on software vulnerability detection problems and obtained a series of interesting research conclusions, which can provide valuable guidelines for researchers and engineers. Specifically, we found that 1) the training speed of RVFL is always faster than BiLSTM, but the prediction accuracy of Bi-LSTM model is higher than RVFL; 2) using doc2vec for vector representation can make the model have faster training speed and generalization ability than using word2vec; and 3) multi-level symbolization is helpful to improve the precision of neural network models.

翻译：软件的弱点通常是设计缺陷或实施错误造成的,这些缺陷或实施错误可能被用来损害系统的安全。目前,最常用的检测软件弱点的方法是静态分析。大多数相关技术工作基于规则或代码相似(源代码水平),并依靠人工定义的脆弱性特征。然而,这些规则和脆弱性特征难以准确界定和设计,这使得静态分析在实际应用中面临许多挑战。为缓解这一问题,一些研究人员提议使用具有自动特征提取能力的神经网络,以提高检测智能。然而,许多类型的神经网络,以及不同的数据处理预处理方法将对模型性能产生重大影响。对于工程师和研究人员来说,根据规则或代码相似性(源代码水平)进行相关的技术工作,需要选择适当的神经网络和数据预处理方法。然而,为了解决这一问题,我们进行了广泛的实验,以测试两种最典型的神经网络(即Bi-LSTM和RVFLL)的性能。为了提高检测速度,两种最古典的数据预处理方法(即矢量表示器和程序符号化速度,不同的数据处理方法对模型和RSLS)的精确度能力将产生显著影响。在软件的SLSLS的精确度测试和速度模型上,我们总是能够提供一个研究的系列,而能的研测测测的研测和速度是,而我们所测的研测测的研的研的研测和测的研的研的研程的研程的研程的精度标准。

相关内容

Neural Networks

关注 1643

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

专知会员服务

55+阅读 · 2020年4月26日

【边缘智能综述论文】A Survey on Edge Intelligence

专知会员服务

122+阅读 · 2020年3月30日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日