专栏生成的深强化学习框架 (A Deep Reinforcement Learning Framework For Column Generation)

Column Generation (CG) is an iterative algorithm for solving linear programs (LPs) with an extremely large number of variables (columns). CG is the workhorse for tackling large-scale integer linear programs, which rely on CG to solve LP relaxations within a branch and bound algorithm. Two canonical applications are the Cutting Stock Problem (CSP) and Vehicle Routing Problem with Time Windows (VRPTW). In VRPTW, for example, each binary variable represents the decision to include or exclude a route, of which there are exponentially many; CG incrementally grows the subset of columns being used, ultimately converging to an optimal solution. We propose RLCG, the first Reinforcement Learning (RL) approach for CG. Unlike typical column selection rules which myopically select a column based on local information at each iteration, we treat CG as a sequential decision-making problem, as the column selected in an iteration affects subsequent iterations of the algorithm. This perspective lends itself to a Deep Reinforcement Learning approach that uses Graph Neural Networks (GNNs) to represent the variable-constraint structure in the LP of interest. We perform an extensive set of experiments using the publicly available BPPLIB benchmark for CSP and Solomon benchmark for VRPTW. RLCG converges faster and reduces the number of CG iterations by 22.4% for CSP and 40.9% for VRPTW on average compared to a commonly used greedy policy.

翻译：列生成 (CG) 是解决线性程序的迭代算法, 其变量数量极多( 栏目) 。 CG 是处理大型整数线性程序的工作马, 它依靠 CG 解决分支和约束算法中的 LP 放松。两个典型的列选择规则不同, 典型的列选择规则是: 剪切股问题( CSP) 和用时间窗口( VRPTW) 解决车辆路由车辆路由问题。例如, 在 VRPTW 中, 每一个二进制变量代表着包含或排除一个路径的决定, 其中有很多; CG 递增地增加了正在使用的列子子集, 最终融合到一个最佳的解决方案。我们建议 RLG G, 首选的加强学习( RLLL) 方法。不同于典型的列选择规则, 直径根据每次循环窗口的本地信息选择一列, 我们把 CG 选择的列作为顺序决策问题, 因为反复选择的列会影响算法的后推导。这个视角本身就是一种深加固学习方法, 方法, 使用InCPLPC. 9 平均C. 比较C.