基于机器学习和关联规则挖掘预测与解释图组合优化问题难度的通用框架 (Towards a General Framework for Predicting and Explaining the Hardness of Graph-based Combinatorial Optimization Problems using Machine Learning and Association Rule Mining)

Towards a General Framework for Predicting and Explaining the Hardness of Graph-based Combinatorial Optimization Problems using Machine Learning and Association Rule Mining

翻译：基于机器学习和关联规则挖掘预测与解释图组合优化问题难度的通用框架

Bharat Sharman,Elkafi Hassini

This study introduces GCO-HPIF, a general machine-learning-based framework to predict and explain the computational hardness of combinatorial optimization problems that can be represented on graphs. The framework consists of two stages. In the first stage, a dataset is created comprising problem-agnostic graph features and hardness classifications of problem instances. Machine-learning-based classification algorithms are trained to map graph features to hardness categories. In the second stage, the framework explains the predictions using an association rule mining algorithm. Additionally, machine-learning-based regression models are trained to predict algorithmic computation times. The GCO-HPIF framework was applied to a dataset of 3287 maximum clique problem instances compiled from the COLLAB, IMDB, and TWITTER graph datasets using five state-of-the-art algorithms, namely three exact branch-and-bound-based algorithms (Gurobi, CliSAT, and MOMC) and two graph-neural-network-based algorithms (EGN and HGS). The framework demonstrated excellent performance in predicting instance hardness, achieving a weighted F1 score of 0.9921, a minority-class F1 score of 0.878, and an ROC-AUC score of 0.9083 using only three graph features. The best association rule found by the FP-Growth algorithm for explaining the hardness predictions had a support of 0.8829 for hard instances and an overall accuracy of 87.64 percent, underscoring the framework's usefulness for both prediction and explanation. Furthermore, the best-performing regression model for predicting computation times achieved a percentage RMSE of 5.12 and an R2 value of 0.991.

翻译：本研究提出了GCO-HPIF，一个基于机器学习的通用框架，用于预测和解释可表示为图的组合优化问题的计算难度。该框架包含两个阶段。第一阶段创建包含问题无关图特征与问题实例难度分类的数据集，训练基于机器学习的分类算法将图特征映射到难度类别。第二阶段使用关联规则挖掘算法解释预测结果。此外，训练基于机器学习的回归模型以预测算法计算时间。GCO-HPIF框架应用于由COLLAB、IMDB和TWITTER图数据集编译的3287个最大团问题实例，采用五种先进算法：三种精确分支定界算法（Gurobi、CliSAT和MOMC）和两种基于图神经网络的算法（EGN和HGS）。该框架在预测实例难度方面表现出色，仅使用三个图特征即获得加权F1分数0.9921、少数类F1分数0.878和ROC-AUC分数0.9083。FP-Growth算法发现的最佳关联规则对困难实例的支持度为0.8829，整体准确率达87.64%，凸显了框架在预测与解释方面的实用性。此外，预测计算时间的最佳回归模型获得百分比RMSE为5.12，R2值为0.991。