Program representation, which aims at converting program source code into vectors with automatically extracted features, is a fundamental problem in programming language processing (PLP). Recent work tries to represent programs with neural networks based on source code structures. However, such methods often focus on the syntax and consider only one single perspective of programs, limiting the representation power of models. This paper proposes a multi-view graph (MVG) program representation method. MVG pays more attention to code semantics and simultaneously includes both data flow and control flow as multiple views. These views are then combined and processed by a graph neural network (GNN) to obtain a comprehensive program representation that covers various aspects. We thoroughly evaluate our proposed MVG approach in the context of algorithm detection, an important and challenging subfield of PLP. Specifically, we use a public dataset POJ-104 and also construct a new challenging dataset ALG-109 to test our method. In experiments, MVG outperforms previous methods significantly, demonstrating our model's strong capability of representing source code.
翻译:程序代表,旨在将程序源代码转换成具有自动提取功能的矢量,是语言处理程序(PLP)的一个根本问题。最近的工作试图在基于源代码结构的神经网络中代表程序。然而,这类方法往往侧重于语法,只考虑一个单一的方案视角,限制模型的演示力。本文建议采用多视图图(MVG)程序代表法。MVG更加关注代码的语义学,同时将数据流和控制流作为多重观点同时包括在内。然后,这些观点由图形神经网络(GNN)合并和处理,以获得涵盖各个方面的全面程序代表。我们从算法检测的角度彻底评估了我们提议的MVG方法,这是PLP的一个重要和具有挑战性的子领域。具体地说,我们使用公共数据集POJ-104,并建立一个具有挑战性的新数据集ALG-109来测试我们的方法。在实验中,MVG超越了以前的方法,从而表明我们的模型代表源码的强大能力。