The inverse problem of supervised reconstruction of depth-variable (time-dependent) parameters in a neural ordinary differential equation (NODE) is considered, that means finding the weights of a residual network with time continuous layers. The NODE is treated as an isolated entity describing the full network as opposed to earlier research, which embedded it between pre- and post-appended layers trained by conventional methods. The proposed parameter reconstruction is done for a general first order differential equation by minimizing a cost functional covering a variety of loss functions and penalty terms. A nonlinear conjugate gradient method (NCG) is derived for the minimization. Mathematical properties are stated for the differential equation and the cost functional. The adjoint problem needed is derived together with a sensitivity problem. The sensitivity problem can estimate changes in the network output under perturbation of the trained parameters. To preserve smoothness during the iterations the Sobolev gradient is calculated and incorporated. As a proof-of-concept, numerical results are included for a NODE and two synthetic datasets, and compared with standard gradient approaches (not based on NODEs). The results show that the proposed method works well for deep learning with infinite numbers of layers, and has built-in stability and smoothness.
翻译:在神经普通差异方程式(NODE)中监督重建深度可变(时间依赖)参数(时间依赖)参数的反问题得到了考虑,这意味着要找到具有时间连续层的剩余网络的权重。将NODE视为一个孤立的实体,描述整个网络,而不是早先的研究,后者将其嵌入由常规方法培训的预与后附加层之间。拟议的参数重建是通过最大限度地降低涉及各种损失功能和惩罚条件的成本功能,为一般第一级差异方程式进行。为尽量减少损失功能和惩罚条件,产生了一种非线性共振梯度法(NCG)的非线性共振梯度法(NCG),为差异方程和成本功能作了数学属性说明。所需的连接问题与敏感度问题一起产生。敏感度问题可以估计网络产出的变化,在经过培训的参数被渗透下进行。为了在迭代期间保持平稳,Sobolev梯度得到计算和吸收。作为证据,列入一个非线性共和两个合成数据集的数字结果,并与标准梯度法(不以NADEs为基础)方法进行比较。结果显示,拟议的稳定度和无限的层次进行。