The pursuit of energy-efficient and adaptive artificial intelligence (AI) has positioned neuromorphic computing as a promising alternative to conventional computing. However, achieving learning on these platforms requires techniques that prioritize local information while enabling effective credit assignment. Here, we propose noise-based reward-modulated learning (NRL), a novel synaptic plasticity rule that mathematically unifies reinforcement learning and gradient-based optimization with biologically-inspired local updates. NRL addresses the computational bottleneck of exact gradients by approximating them through stochastic neural activity, transforming the inherent noise of biological and neuromorphic substrates into a functional resource. Drawing inspiration from biological learning, our method uses reward prediction errors as its optimization target to generate increasingly advantageous behavior, and eligibility traces to facilitate retrospective credit assignment. Experimental validation on reinforcement tasks, featuring immediate and delayed rewards, shows that NRL achieves performance comparable to baselines optimized using backpropagation, although with slower convergence, while showing significantly superior performance and scalability in multi-layer networks compared to reward-modulated Hebbian learning (RMHL), the most prominent similar approach. While tested on simple architectures, the results highlight the potential of noise-driven, brain-inspired learning for low-power adaptive systems, particularly in computing substrates with locality constraints. NRL offers a theoretically grounded paradigm well-suited for the event-driven characteristics of next-generation neuromorphic AI.
翻译:对高能效和自适应人工智能(AI)的追求使得神经形态计算成为传统计算的有前景的替代方案。然而,在这些平台上实现学习需要优先利用局部信息并实现有效信用分配的技术。本文提出了一种基于噪声的奖励调制学习(NRL),这是一种新颖的突触可塑性规则,它在数学上统一了强化学习和基于梯度的优化,并采用受生物启发的局部更新。NRL通过随机神经活动近似精确梯度,从而解决了精确梯度的计算瓶颈,将生物和神经形态基底固有的噪声转化为功能性资源。受生物学习的启发,我们的方法使用奖励预测误差作为优化目标以生成日益有利的行为,并利用资格迹促进回顾性信用分配。在包含即时和延迟奖励的强化任务上的实验验证表明,NRL实现了与使用反向传播优化的基线相当的性能,尽管收敛速度较慢,同时在多层网络中与最突出的类似方法——奖励调制赫布学习(RMHL)相比,表现出显著优越的性能和可扩展性。虽然在简单架构上进行了测试,但结果凸显了噪声驱动、类脑学习在低功耗自适应系统中的潜力,特别是在具有局部性约束的计算基底中。NRL提供了一个理论基础坚实的范式,非常适合下一代神经形态AI的事件驱动特性。