Efficient deployment of machine learning models ultimately requires taking hardware constraints into account. The binary logic gate is the fundamental building block of all digital chips. Designing models that operate directly on these units enables energy-efficient computation. Recent work has demonstrated the feasibility of training randomly connected networks of binary logic gates (such as OR and NAND) using gradient-based methods. We extend this approach by using gradient descent not only to select the logic gates but also to optimize their interconnections (the connectome). Optimizing the connections allows us to substantially reduce the number of logic gates required to fit a particular dataset. Our implementation is efficient both at training and inference: for instance, our LILogicNet model with only 8,000 gates can be trained on MNIST in under 5 minutes and achieves 98.45% test accuracy, matching the performance of state-of-the-art models that require at least two orders of magnitude more gates. Moreover, for our largest architecture with 256,000 gates, LILogicNet achieves 60.98% test accuracy on CIFAR-10 exceeding the performance of prior logic-gate-based models with a comparable gate budget. At inference time, the fully binarized model operates with minimal compute overhead, making it exceptionally efficient and well suited for deployment on low-power digital hardware.
翻译:机器学习模型的高效部署最终需要考虑硬件约束。二进制逻辑门是所有数字芯片的基本构建模块。设计直接在这些单元上运行的模型能够实现高能效计算。最近的研究表明,使用基于梯度的方法训练随机连接的二进制逻辑门网络(如OR和NAND)是可行的。我们扩展了这一方法,不仅使用梯度下降选择逻辑门,还优化其互连结构(连接组)。优化连接使我们能够显著减少拟合特定数据集所需的逻辑门数量。我们的实现在训练和推理阶段均高效:例如,仅包含8,000个门的LILogicNet模型可在5分钟内在MNIST数据集上完成训练,并达到98.45%的测试准确率,与需要至少两个数量级更多门的最先进模型性能相当。此外,对于我们最大的架构(包含256,000个门),LILogicNet在CIFAR-10数据集上实现了60.98%的测试准确率,超过了具有可比门预算的先前基于逻辑门的模型性能。在推理时,完全二值化的模型以极低计算开销运行,使其异常高效,非常适合部署在低功耗数字硬件上。