One frequently wishes to learn a range of similar tasks as efficiently as possible, re-using knowledge across tasks. In artificial neural networks, this is typically accomplished by conditioning a network upon task context by injecting context as input. Brains have a different strategy: the parameters themselves are modulated as a function of various neuromodulators such as serotonin. Here, we take inspiration from neuromodulation and propose to learn weights which are smoothly parameterized functions of task context variables. Rather than optimize a weight vector, i.e. a single point in weight space, we optimize a smooth manifold in weight space with a predefined topology. To accomplish this, we derive a formal treatment of optimization of manifolds as the minimization of a loss functional subject to a constraint on volumetric movement, analogous to gradient descent. During inference, conditioning selects a single point on this manifold which serves as the effective weight matrix for a particular sub-task. This strategy for conditioning has two main advantages. First, the topology of the manifold (whether a line, circle, or torus) is a convenient lever for inductive biases about the relationship between tasks. Second, learning in one state smoothly affects the entire manifold, encouraging generalization across states. To verify this, we train manifolds with several topologies, including straight lines in weight space (for conditioning on e.g. noise level in input data) and ellipses (for rotated images). Despite their simplicity, these parameterizations outperform conditioning identical networks by input concatenation and better generalize to out-of-distribution samples. These results suggest that modulating weights over low-dimensional manifolds offers a principled and effective alternative to traditional conditioning.
翻译:在机器学习中,经常需要高效学习一系列相似任务,并在任务间复用知识。在人工神经网络中,通常通过将任务上下文作为输入注入网络来实现条件化。大脑则采用不同策略:其参数本身会受血清素等多种神经调节物质的调控。受神经调节机制启发,本文提出学习权重作为任务上下文变量的平滑参数化函数。我们并非优化权重向量(即权重空间中的单个点),而是优化权重空间中具有预定拓扑结构的平滑流形。为此,我们推导了流形优化的形式化方法,将其视为在体积运动约束下的损失泛函最小化问题,类似于梯度下降过程。在推理阶段,条件化操作会选择流形上的单个点,作为特定子任务的有效权重矩阵。这种条件化策略具有两大优势:首先,流形的拓扑结构(无论是直线、圆环还是环面)为任务间关系提供了便捷的归纳偏置杠杆;其次,在某一状态下的学习会平滑影响整个流形,从而促进跨状态泛化。为验证此方法,我们训练了多种拓扑结构的流形,包括权重空间中的直线(用于输入数据噪声水平等条件化)和椭圆(用于旋转图像条件化)。尽管参数化形式简单,这些方法在性能上超越了通过输入拼接实现条件化的相同网络,并能更好地泛化至分布外样本。这些结果表明,在低维流形上调节权重为传统条件化方法提供了理论严谨且高效可行的替代方案。