The role of hidden units in recurrent neural networks is typically seen as modeling memory, with research focusing on enhancing information retention through gating mechanisms. A less explored perspective views hidden units as active participants in the computation performed by the network, rather than passive memory stores. In this work, we revisit bilinear operations, which involve multiplicative interactions between hidden units and input embeddings. We demonstrate theoretically and empirically that they constitute a natural inductive bias for representing the evolution of hidden states in state tracking tasks. These are the simplest type of tasks that require hidden units to actively contribute to the behavior of the network. We also show that bilinear state updates form a natural hierarchy corresponding to state tracking tasks of increasing complexity, with popular linear recurrent networks such as Mamba residing at the lowest-complexity center of that hierarchy.
翻译:循环神经网络中隐藏单元的作用通常被视为建模记忆,研究重点在于通过门控机制增强信息保留能力。一个较少被探讨的视角将隐藏单元视为网络执行计算过程中的主动参与者,而非被动的记忆存储单元。在本工作中,我们重新审视了涉及隐藏单元与输入嵌入之间乘法交互的双线性操作。我们从理论和实验上证明,它们构成了在状态跟踪任务中表示隐藏状态演化的自然归纳偏置。这类任务是最简单的、要求隐藏单元主动参与网络行为计算的任务类型。我们还表明,双线性状态更新形成了一个自然层次结构,对应于复杂度递增的状态跟踪任务,而诸如Mamba等流行的线性循环网络则位于该层次结构中复杂度最低的中心位置。