State-Space Models (SSMs) have emerged as efficient alternatives to computationally intensive architectures like Transformers, particularly for sequence modeling. However, a fundamental challenge in their training is the reliance on static loss functions, which may not be optimal across all learning stages. To address this issue, in this paper a hybrid model integrating the Hyena architecture with a Dynamic Loss Network (DLN) is proposed which is guided by a Learn-to-Teach (L2T) approach (L2T-DLN). In this framework, the Hyena model is a student, and its loss function is optimized adaptively. A teacher model, leveraging a memory of the student's past performance, guides the DLN in dynamically balancing the primary cross-entropy loss and a regularization term. Experiments on the Penn Treebank (PTB) dataset show that our approach significantly improves language modeling performance. Our proposed model achieved a validation Perplexity of 102.6, a notable improvement over the 110.4 achieved by a baseline Hyena model using a static loss function. This research indicates that combining SSMs with adaptive loss function markedly enhances the quality and efficiency of deep learning models for sequential data, showing potential for applications in Natural Language Processing (NLP), time-series analysis, and biological signal processing.
翻译:状态空间模型(SSMs)已成为计算密集型架构(如Transformer)的高效替代方案,尤其在序列建模领域。然而,其训练中的一个根本挑战在于对静态损失函数的依赖,这可能无法在所有学习阶段达到最优。为解决这一问题,本文提出了一种将Hyena架构与动态损失网络(DLN)相结合的混合模型,该模型由学习-教学(L2T)方法(L2T-DLN)指导。在此框架中,Hyena模型作为学生,其损失函数被自适应地优化。教师模型利用学生过往性能的记忆,指导DLN动态平衡主交叉熵损失和正则化项。在Penn Treebank(PTB)数据集上的实验表明,我们的方法显著提升了语言建模性能。所提出的模型实现了102.6的验证困惑度,较使用静态损失函数的基准Hyena模型的110.4有明显改进。这项研究表明,将SSMs与自适应损失函数相结合,显著提升了深度学习模型处理序列数据的质量和效率,在自然语言处理(NLP)、时间序列分析和生物信号处理等领域展现出应用潜力。