Graphs provide a natural way to represent data by encoding information about objects and the relationships between them. With the ever-increasing amount of data collected and generated, locating specific patterns of relationships between objects in a graph is often required. Given a larger graph and a smaller graph, one may wish to identify instances of the smaller query graph in the larger target graph. This task is called subgraph identification or matching. Subgraph matching is helpful in areas such as bioinformatics, binary analysis, pattern recognition, and computer vision. In these applications, datasets frequently contain noise and errors, thus exact subgraph matching algorithms do not apply. In this paper we introduce a new customizable algorithm for inexact subgraph matching. Our algorithm utilizes node and edge attributes which are often present in real-world datasets to narrow down the search space. The algorithm is flexible in the type of subgraph matching it can perform and the types of datasets it can process by its use of a modifiable graph edit distance cost function for pairing nodes. We show its effectiveness on family trees graphs and control-flow graphs.
翻译:图通过编码对象及其间关系的信息,为数据表示提供了一种自然的方式。随着收集和生成的数据量不断增加,通常需要在图中定位对象间特定的关系模式。给定一个较大的图和一个较小的图,人们可能希望在较大的目标图中识别较小查询图的实例。这一任务称为子图识别或匹配。子图匹配在生物信息学、二进制分析、模式识别和计算机视觉等领域具有重要应用价值。在这些应用中,数据集常包含噪声和误差,因此精确子图匹配算法并不适用。本文提出了一种新的可定制非精确子图匹配算法。该算法利用现实数据集中常存在的节点和边属性来缩小搜索空间。通过采用可修改的图编辑距离成本函数进行节点配对,该算法在执行子图匹配的类型和处理的数据集类型方面具有灵活性。我们通过在族谱图和控流图上的实验验证了其有效性。