In many real-world reinforcement learning (RL) problems, the environment exhibits inherent symmetries that can be exploited to improve learning efficiency. This paper develops a theoretical and algorithmic framework for incorporating known group symmetries into kernel-based RL. We propose a symmetry-aware variant of optimistic least-squares value iteration (LSVI), which leverages invariant kernels to encode invariance in both rewards and transition dynamics. Our analysis establishes new bounds on the maximum information gain and covering numbers for invariant RKHSs, explicitly quantifying the sample efficiency gains from symmetry. Empirical results on a customized Frozen Lake environment and a 2D placement design problem confirm the theoretical improvements, demonstrating that symmetry-aware RL achieves significantly better performance than their standard kernel counterparts. These findings highlight the value of structural priors in designing more sample-efficient reinforcement learning algorithms.
翻译:在许多现实世界的强化学习(RL)问题中,环境展现出固有的对称性,这些对称性可用于提升学习效率。本文提出了一个理论和算法框架,将已知的群对称性融入基于核的强化学习中。我们提出了一种对称感知的乐观最小二乘值迭代(LSVI)变体,该变体利用不变核来编码奖励和转移动态中的不变性。我们的分析为不变再生核希尔伯特空间的最大信息增益和覆盖数建立了新的界限,明确量化了对称性带来的样本效率提升。在定制的Frozen Lake环境和二维布局设计问题上的实证结果验证了理论改进,表明对称感知强化学习相比标准核方法实现了显著更优的性能。这些发现凸显了结构先验在设计更高样本效率强化学习算法中的价值。