There is a recent focus on designing architectures that have an Integer Linear Programming (ILP) layer within a neural model (referred to as Neural ILP in this paper). Neural ILP architectures are suitable for pure reasoning tasks that require data-driven constraint learning or for tasks requiring both perception (neural) and reasoning (ILP). A recent SOTA approach for end-to-end training of Neural ILP explicitly defines gradients through the ILP black box (Paulus et al. 2021) - this trains extremely slowly, owing to a call to the underlying ILP solver for every training data point in a minibatch. In response, we present an alternative training strategy that is solver-free, i.e., does not call the ILP solver at all at training time. Neural ILP has a set of trainable hyperplanes (for cost and constraints in ILP), together representing a polyhedron. Our key idea is that the training loss should impose that the final polyhedron separates the positives (all constraints satisfied) from the negatives (at least one violated constraint or a suboptimal cost value), via a soft-margin formulation. While positive example(s) are provided as part of the training data, we devise novel techniques for generating negative samples. Our solution is flexible enough to handle equality as well as inequality constraints. Experiments on several problems, both perceptual as well as symbolic, which require learning the constraints of an ILP, show that our approach has superior performance and scales much better compared to purely neural baselines and other state-of-the-art models that require solver-based training. In particular, we are able to obtain excellent performance in 9 x 9 symbolic and visual sudoku, to which the other Neural ILP solver is not able to scale.
翻译:最近的重点是在神经模型(本文中称为 Neal ILP) 的神经模型中设计具有 Integer 线性编程( ILP) 层的结构( ILP) 。 神经的 ILP 结构适合于纯粹的推理任务, 需要数据驱动的制约学习, 或者需要认知( Neal) 和推理( ILP ) 的任务。 最近的一个SOTA 方法, 用于神经线性设计( Paulus 等人 2021) 黑盒的端到端培训, 明确定义梯度。 由于在神经模型( Paulus 等人 2021) 中, 神经性设计非常慢, 因为它需要IMP 基础的 IMP 解析程序, 在最小的 IMP 的每个虚拟数据点中, 神经性能的软性能( 所有限制), 我们的高级性能( ) 也要求其它的高级性能( ) 也要求其它的高级性能( ), 也要求其它的高级性能( 我们的性能的性能, ) 的性能( ) 也要求其它的性能( 我们的性能的性能( ) 的性能( ) ) 的性能( 我们的性能( ) ) 的性能( ) ) 的性能的性能( ) ) 的性能( ) ) ) 和性能( 我们的性能( ) ) 的性能( ) 的性能( ) ) 的性能( 我们的性能( ) ) ) 的性能( ) ) 的性能( ) ( ) ) ) ( ) ) ) ) ) ) ( ) ) ) ( ) ) ) 和性能( 我们的性能( 的性能( ) ) 的性能( 我们的性能( ) ) ) ) ) ( ) ) ( ) ) ) ) ) ) ( ) ) 的性能( ) 的性能( 性能( 性能( )