Deep Equilibrium Models (DEQs) are an interesting class of implicit model where the model output is implicitly defined as the fixed point of a learned function. These models have been shown to outperform explicit (fixed-depth) models in large-scale tasks by trading many deep layers for a single layer that is iterated many times. However, gradient calculation through DEQs is approximate. This often leads to unstable training dynamics and requires regularisation or many function evaluations to fix. Here, we introduce Reversible Deep Equilibrium Models (RevDEQs) that allow for exact gradient calculation, no regularisation and far fewer function evaluations than DEQs. We show that RevDEQs significantly improve performance on language modelling and image classification tasks against comparable implicit and explicit models.
翻译:深度平衡模型(DEQs)是一类有趣的隐式模型,其输出被隐式定义为学习函数的固定点。研究表明,通过将多个深层结构替换为单层多次迭代,此类模型在大规模任务中表现优于显式(固定深度)模型。然而,DEQs的梯度计算具有近似性,常导致训练过程不稳定,需依赖正则化或大量函数评估来修正。本文提出可逆深度平衡模型(RevDEQs),该模型支持精确梯度计算,无需正则化,且函数评估次数远少于DEQs。实验表明,在语言建模与图像分类任务中,RevDEQs相较于同类隐式与显式模型均取得显著性能提升。