We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of relationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)
翻译:我们探索图形神经网络在从安全角度学习源代码细微之处时的适用性。 具体地说, 源代码中的脆弱特征能否从其图形表达方式中, 从节点和边缘之间的关系中学习。 我们创建了一条管道, 我们称之为 AI4VA, 它首先将样本源代码编码为代码属性图。 然后, 提取的图形以保存其语义信息的方式矢量化。 然后, 一个 Gate 图形神经网络 培训了几个这样的图表, 以自动提取与健康样本中脆弱样本图不同的模板。 我们的模型优于静态分析器、经典机器学习, 以及CNN 和 RNN的关于我们实验的三个数据集中的两个的深层次学习模型。 因此, 我们显示, 代码的编码对脆弱性检测比现有的代码as- phopto 和线性序列编码方法更有意义 。 (第 2019号, Paper# 28, ICST)