We propose KOALA++, a scalable Kalman-based optimization algorithm that explicitly models structured gradient uncertainty in neural network training. Unlike second-order methods, which rely on expensive second order gradient calculation, our method directly estimates the parameter covariance matrix by recursively updating compact gradient covariance products. This design improves upon the original KOALA framework that assumed diagonal covariance by implicitly capturing richer uncertainty structure without storing the full covariance matrix and avoiding large matrix inversions. Across diverse tasks, including image classification and language modeling, KOALA++ achieves accuracy on par or better than state-of-the-art first- and second-order optimizers while maintaining the efficiency of first-order methods.
翻译:本文提出KOALA++,一种可扩展的基于卡尔曼滤波的优化算法,该算法在神经网络训练中显式建模结构化梯度不确定性。与依赖昂贵二阶梯度计算的二阶优化方法不同,本方法通过递归更新紧凑的梯度协方差乘积来直接估计参数协方差矩阵。该设计改进了原始KOALA框架中假设协方差矩阵为对角阵的局限,能够隐式捕捉更丰富的不确定性结构,同时无需存储完整协方差矩阵且避免了大规模矩阵求逆运算。在图像分类与语言建模等多项任务中,KOALA++在保持一阶方法计算效率的同时,达到了与当前最先进的一阶及二阶优化器相当或更优的精度。