In general, to draw robust conclusions from a dataset, all the analyzed population must be represented on said dataset. Having a dataset that does not fulfill this condition normally leads to selection bias. Additionally, graphs have been used to model a wide variety of problems. Although synthetic graphs can be used to augment available real graph datasets to overcome selection bias, the generation of unbiased synthetic datasets is complex with current tools. In this work, we propose a method to find a synthetic graph dataset that has an even representation of graphs with different metrics. The resulting dataset can then be used, among others, for benchmarking graph processing techniques as the accuracy of different Graph Neural Network (GNN) models or the speedups obtained by different graph processing acceleration frameworks.
翻译:一般而言,为了从数据集中得出稳健的结论,所有被分析的人口都必须在上述数据集中代表。如果数据集不满足这一条件,通常会导致选择偏差。此外,图表被用于模拟各种各样的问题。虽然合成图表可以用来增加可用的真实的图表数据集,以克服选择偏差,但生成不偏倚的合成数据集与当前工具是复杂的。在这项工作中,我们提出了一个方法,以找到一个合成图表数据集,该数据集以不同度量的图表为偶数。由此产生的数据集除其他外,可以用作基准图形处理技术,作为不同的图形神经网络模型或不同图形处理加速框架所获取的加速器的准确性。