Continual learning on edge platforms remains challenging because recurrent networks depend on energy-intensive training procedures and frequent data movement that are impractical for embedded deployments. This work introduces M2RU, a mixed-signal architecture that implements the minion recurrent unit for efficient temporal processing with on-chip continual learning. The architecture integrates weighted-bit streaming, which enables multi-bit digital inputs to be processed in crossbars without high-resolution conversion, and an experience replay mechanism that stabilizes learning under domain shifts. M2RU achieves 15 GOPS at 48.62 mW, corresponding to 312 GOPS per watt, and maintains accuracy within 5 percent of software baselines on sequential MNIST and CIFAR-10 tasks. Compared with a CMOS digital design, the accelerator provides 29X improvement in energy efficiency. Device-aware analysis shows an expected operational lifetime of 12.2 years under continual learning workloads. These results establish M2RU as a scalable and energy-efficient platform for real-time adaptation in edge-level temporal intelligence.
翻译:边缘平台上的持续学习仍然面临挑战,因为循环网络依赖于能耗密集的训练过程和频繁的数据移动,这在嵌入式部署中并不实用。本文提出M2RU,一种混合信号架构,它实现了小兵循环单元,用于高效的时序处理与片上持续学习。该架构集成了加权位流技术,使得多位数字输入无需高精度转换即可在交叉阵列中处理,并采用经验回放机制以稳定领域漂移下的学习过程。M2RU在48.62 mW功耗下实现15 GOPS的运算性能,相当于每瓦312 GOPS,并在序列MNIST和CIFAR-10任务上保持与软件基线相比5%以内的精度损失。与CMOS数字设计相比,该加速器能效提升29倍。器件感知分析表明,在持续学习工作负载下预期运行寿命可达12.2年。这些结果确立了M2RU作为边缘端时序智能实时自适应的可扩展高能效平台。