Learn to Optimize (L2O) trains deep neural network-based solvers for optimization, achieving success in accelerating convex problems and improving non-convex solutions. However, L2O lacks rigorous theoretical backing for its own training convergence, as existing analyses often use unrealistic assumptions -- a gap this work highlights empirically. We bridge this gap by proving the training convergence of L2O models that learn Gradient Descent (GD) hyperparameters for quadratic programming, leveraging the Neural Tangent Kernel (NTK) theory. We propose a deterministic initialization strategy to support our theoretical results and promote stable training over extended optimization horizons by mitigating gradient explosion. Our L2O framework demonstrates over 50% better optimality than GD and superior robustness over state-of-the-art L2O methods on synthetic datasets. The code of our method can be found from https://github.com/NetX-lab/MathL2OProof-Official.
翻译:学习优化(L2O)通过训练基于深度神经网络的求解器进行优化,在加速凸问题求解和改善非凸问题解方面取得了成功。然而,L2O自身训练收敛性缺乏严格的理论支撑,现有分析常使用不切实际的假设——本文通过实证研究揭示了这一空白。我们通过证明学习梯度下降(GD)超参数以求解二次规划的L2O模型的训练收敛性,并借助神经正切核(NTK)理论,填补了这一空白。我们提出了一种确定性初始化策略以支持理论结果,并通过抑制梯度爆炸促进长时优化过程中的稳定训练。在合成数据集上,我们的L2O框架展现出比GD超过50%的优化性能提升,并优于现有先进L2O方法的鲁棒性。方法代码可见于 https://github.com/NetX-lab/MathL2OProof-Official。