Many computational tasks benefit from being formulated as the composition of neural networks followed by a discrete symbolic program. The goal of neurosymbolic learning is to train the neural networks using end-to-end input-output labels of the composite. We introduce CTSketch, a novel, scalable neurosymbolic learning algorithm. CTSketch uses two techniques to improve the scalability of neurosymbolic inference: decompose the symbolic program into sub-programs and summarize each sub-program with a sketched tensor. This strategy allows us to approximate the output distribution of the program with simple tensor operations over the input distributions and the sketches. We provide theoretical insight into the maximum approximation error. Furthermore, we evaluate CTSketch on benchmarks from the neurosymbolic learning literature, including some designed for evaluating scalability. Our results show that CTSketch pushes neurosymbolic learning to new scales that were previously unattainable, with neural predictors obtaining high accuracy on tasks with one thousand inputs, despite supervision only on the final output.
翻译:许多计算任务通过表述为神经网络与离散符号程序的组合而获益。神经符号学习的目标是利用组合系统的端到端输入-输出标签来训练神经网络。本文提出CTSketch——一种新颖且可扩展的神经符号学习算法。CTSketch采用两项技术提升神经符号推理的可扩展性:将符号程序分解为子程序,并使用草图化张量对每个子程序进行概要表示。该策略使我们能够通过输入分布与草图张量间的简单张量运算来近似程序的输出分布。我们为最大近似误差提供了理论分析。此外,我们在神经符号学习文献的基准测试(包括专为评估可扩展性设计的测试集)上评估了CTSketch。实验结果表明,CTSketch将神经符号学习推向了前所未有的规模——即使仅在最终输出端提供监督信号,神经预测器也能在包含上千个输入的任务中实现高精度。