Reinforcement Learning is a powerful tool to model decision-making processes. However, it relies on an exploration-exploitation trade-off that remains an open challenge for many tasks. In this work, we study neighboring state-based, model-free exploration led by the intuition that, for an early-stage agent, considering actions derived from a bounded region of nearby states may lead to better actions when exploring. We propose two algorithms that choose exploratory actions based on a survey of nearby states, and find that one of our methods, ${\rho}$-explore, consistently outperforms the Double DQN baseline in an discrete environment by 49% in terms of Eval Reward Return.
翻译:强化学习是建模决策过程的有力工具,但其依赖于探索与利用的权衡,这在许多任务中仍是一个开放性的挑战。本研究基于邻近状态的模型无关探索方法展开研究,其核心直觉是:对于早期阶段的智能体,考虑从邻近状态有界区域衍生的动作可能在探索过程中产生更优的动作选择。我们提出了两种基于邻近状态调查选择探索性动作的算法,并发现其中一种方法——${\\rho}$-explore,在离散环境中以评估奖励回报为指标,持续优于Double DQN基线算法49%。