您只对单线化一次: 切线转换为渐变 (You Only Linearize Once: Tangents Transpose to Gradients)

Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzipping the linear and non-linear parts and then (iii) transposition of the linear part. To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing, which emerges naturally from a free choice in unzipping `let` expressions. As a corollary, checkpointing techniques are applicable to general-purpose partial evaluation, not just AD. We hope that our formalization will lead to a deeper understanding of automatic differentiation and that it will simplify implementations, by separating the concerns of differentiation proper from the concerns of gaining efficiency (namely, separating the derivative computation from the act of running it backward).

翻译：自动差异(AD)通常被理解为由两种不同的算法组成的组合,其根基是两种“模式” -- -- 向前和反向 -- -- 通常分别提出(和执行)。能否只有一个?在JAX和Dex项目中开发的AD系统之后,我们正式将反向模式AD分解成(一)前向模式AD,然后是(二)将线性和非线性部分拆解成(二)线性和非线性部分,然后是(三)线性部分的转换。为此,我们定义了能够证明某类功能是(横向)线性)线性系统的线性系统。我们的主要结果是,前方模式ADD产生这种线性功能,我们可以将任何此类线性功能分解和转换成(一)前向模式ADD,然后是(二)将线性和非线性部分部分部分部分部分的转换成反向式,然后是(二)线性检查还使检查过程自然地从解析`删除'表达方式。作为必然的结果,检查技术适用于一般用途的后向后向后向性分级性、分级效率的分级化,我们希望通过分级化、分级化、分级性、分级化、分级性、分级性、分级性、分级性、分级性、制性、我们会简化性、会简化、会简化性、会简化性、会简化性、简化性、简化性、制性、制性、我们。