Reinforcement Learning (RL) has opened up new opportunities to solve a wide range of complex decision-making tasks. However, modern RL algorithms, e.g., Deep Q-Learning, are based on deep neural networks, putting high computational costs when running on edge devices. In this paper, we propose QHD, a Hyperdimensional Reinforcement Learning, that mimics brain properties toward robust and real-time learning. QHD relies on a lightweight brain-inspired model to learn an optimal policy in an unknown environment. We first develop a novel mathematical foundation and encoding module that maps state-action space into high-dimensional space. We accordingly develop a hyperdimensional regression model to approximate the Q-value function. The QHD-powered agent makes decisions by comparing Q-values of each possible action. We evaluate the effect of the different RL training batch sizes and local memory capacity on the QHD quality of learning. Our QHD is also capable of online learning with tiny local memory capacity, which can be as small as the training batch size. QHD provides real-time learning by further decreasing the memory capacity and the batch size. This makes QHD suitable for highly-efficient reinforcement learning in the edge environment, where it is crucial to support online and real-time learning. Our solution also supports a small experience replay batch size that provides 12.3 times speedup compared to DQN while ensuring minimal quality loss. Our evaluation shows QHD capability for real-time learning, providing 34.6 times speedup and significantly better quality of learning than state-of-the-art deep RL algorithms.
翻译:强化学习(RL)为解决一系列复杂的决策任务开辟了新的机会。然而,现代RL算法,例如深Q学习(Deep Q-Learning),以深神经网络为基础,在边缘设备运行时计算成本高。在本文件中,我们建议QHD(超多维强化学习),将大脑特性模拟成强力和实时学习。QHD(QHD)依靠一个轻巧的大脑启发型模型,在一个未知的环境中学习最佳政策。我们首先开发一个新的数学基础和编码模块,将国家行动空间映射为高维空间。我们相应地开发了一个超维度回归模型,以近似于Q值功能。QHD(HD)动力代理商通过比较每种可能行动的Q值作出决定。我们评估了不同的RL培训批量规模和地方记忆能力对QHD质量学习质量的影响。我们QHD(QHD)还能够以小的本地记忆能力进行在线学习,这种能力可以像培训批量规模那样小。QHD(HD)提供实时学习模型,通过进一步降低升级质量支持我们的在线学习能力,同时提供我们的升级学习能力, QL(QL) QQQ) 和批次级学习能力,从而提供我们至关重要的升级的升级的学习。