Neural scaling laws and double-descent phenomena suggest that deep-network training obeys a simple macroscopic structure despite highly nonlinear optimization dynamics. We derive such structure directly from gradient descent in function space. For mean-squared error loss, the training error evolves as $\dot e_t=-M(t)e_t$ with $M(t)=J_{θ(t)}J_{θ(t)}^{\!*}$, a time-dependent self-adjoint operator induced by the network Jacobian. Using Kato perturbation theory, we obtain an exact system of coupled modewise ODEs in the instantaneous eigenbasis of $M(t)$. To extract macroscopic behavior, we introduce a logarithmic spectral-shell coarse-graining and track quadratic error energy across shells. Microscopic interactions within each shell cancel identically at the energy level, so shell energies evolve only through dissipation and external inter-shell interactions. We formalize this via a \emph{renormalizable shell-dynamics} assumption, under which cumulative microscopic effects reduce to a controlled net flux across shell boundaries. Assuming an effective power-law spectral transport in a relevant resolution range, the shell dynamics admits a self-similar solution with a moving resolution frontier and explicit scaling exponents. This framework explains neural scaling laws and double descent, and unifies lazy (NTK-like) training and feature learning as two limits of the same spectral-shell dynamics.
翻译:神经缩放定律与双重下降现象表明,尽管存在高度非线性的优化动力学,深度网络的训练遵循一种简单的宏观结构。我们直接从函数空间中的梯度下降推导出这种结构。对于均方误差损失,训练误差按 $\dot e_t=-M(t)e_t$ 演化,其中 $M(t)=J_{θ(t)}J_{θ(t)}^{\!*}$ 是由网络雅可比矩阵诱导的一个时间依赖的自伴算子。利用加藤扰动理论,我们在 $M(t)$ 的瞬时特征基中得到了一个精确的耦合模态常微分方程组。为提取宏观行为,我们引入了对数谱壳粗粒化方法,并追踪各壳内的二次误差能量。每个壳内的微观相互作用在能量层次上完全抵消,因此壳能量仅通过耗散和外部壳间相互作用演化。我们通过一个\emph{可重整化的壳动力学}假设将其形式化,在该假设下,累积的微观效应简化为跨壳边界的一个受控净通量。假设在相关分辨率范围内存在有效的幂律谱输运,则壳动力学允许一个具有移动分辨率前沿和显式缩放指数的自相似解。该框架解释了神经缩放定律和双重下降现象,并将惰性(类NTK)训练与特征学习统一为同一谱壳动力学的两种极限情形。