QL-LSTM：一种用于稳定长序列建模的参数高效LSTM (QL-LSTM: A Parameter-Efficient LSTM for Stable Long-Sequence Modeling)

Recurrent neural architectures such as LSTM and GRU remain widely used in sequence modeling, but they continue to face two core limitations: redundant gate-specific parameters and reduced ability to retain information across long temporal distances. This paper introduces the Quantum-Leap LSTM (QL-LSTM), a recurrent architecture designed to address both challenges through two independent components. The Parameter-Shared Unified Gating mechanism replaces all gate-specific transformations with a single shared weight matrix, reducing parameters by approximately 48 percent while preserving full gating behavior. The Hierarchical Gated Recurrence with Additive Skip Connections component adds a multiplication-free pathway that improves long-range information flow and reduces forget-gate degradation. We evaluate QL-LSTM on sentiment classification using the IMDB dataset with extended document lengths, comparing it to LSTM, GRU, and BiLSTM reference models. QL-LSTM achieves competitive accuracy while using substantially fewer parameters. Although the PSUG and HGR-ASC components are more efficient per time step, the current prototype remains limited by the inherent sequential nature of recurrent models and therefore does not yet yield wall-clock speed improvements without further kernel-level optimization.

翻译：循环神经架构（如LSTM和GRU）在序列建模中仍被广泛使用，但它们持续面临两个核心限制：冗余的门控专用参数以及跨长时距保留信息的能力下降。本文介绍了量子飞跃LSTM（QL-LSTM），这是一种通过两个独立组件应对上述挑战的循环架构。参数共享统一门控机制以单个共享权重矩阵取代所有门控专用变换，在保持完整门控行为的同时将参数减少约48%。分层门控循环与加法跳跃连接组件添加了一条无乘法通路，改善了长程信息流并减少了遗忘门退化。我们在IMDB数据集上使用扩展文档长度评估QL-LSTM的情感分类性能，并与LSTM、GRU和BiLSTM参考模型进行比较。QL-LSTM在使用显著更少参数的情况下实现了具有竞争力的准确率。尽管PSUG和HGR-ASC组件在每时间步上更高效，但当前原型仍受限于循环模型固有的顺序特性，因此若未进行进一步内核级优化，尚无法实现实际运行速度的提升。

相关内容

长短期记忆网络

关注 0

长短期记忆网络(LSTM)是一种用于深度学习领域的人工回归神经网络(RNN)结构。与标准的前馈神经网络不同，LSTM具有反馈连接。它不仅可以处理单个数据点(如图像)，还可以处理整个数据序列(如语音或视频)。例如，LSTM适用于未分段、连接的手写识别、语音识别、网络流量或IDSs(入侵检测系统)中的异常检测等任务。

【KDD2024】HiGPT:异构图语言模型

专知会员服务

19+阅读 · 2024年7月9日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日