强化学习中以基于核心密度估计为基础的州空间覆盖面加速改进探索起点 (Improved Exploring Starts by Kernel Density Estimation-Based State-Space Coverage Acceleration in Reinforcement Learning)

Reinforcement learning (RL) is currently a popular research topic in control engineering and has the potential to make its way to industrial and commercial applications. Corresponding RL controllers are trained in direct interaction with the controlled system, rendering them data-driven and performance-oriented solutions. The best practice of exploring starts (ES) is used by default to support the learning process via randomly picked initial states. However, this method might deliver strongly biased results if the system's dynamic and constraints lead to unfavorable sample distributions in the state space (e.g., condensed sample accumulation in certain state-space areas). To overcome this issue, a kernel density estimation-based state-space coverage acceleration (DESSCA) is proposed, which improves the ES concept by prioritizing infrequently visited states for a more balanced coverage of the state space during training. Considered test scenarios are mountain car, cartpole and electric motor control environments. Using DQN and DDPG as exemplary RL algorithms, it can be shown that DESSCA is a simple yet effective algorithmic extension to the established ES approach.

翻译：强化学习(RL)目前是一个在控制工程方面很受欢迎的研究课题,有可能进入工业和商业应用领域。相应的RL控制员在与受控系统直接互动时接受培训,使他们获得数据驱动和面向性能的解决办法。探索启动(ES)的最佳做法是默认地通过随机选取的初步状态来支持学习过程。然而,如果系统的动态和制约因素导致国家空间的不受欢迎的样本分布(例如某些州空间区域的压缩样本积累),这种方法可能会产生严重偏差的结果。为了克服这一问题,建议采用以内核密度估计为基础的州空间覆盖加速(DESCA),通过在培训期间优先考虑不经常访问的州对州空间进行更平衡的覆盖来改进ES概念。考虑的测试情景是山车、木偶和电动发动机控制环境。使用DQN和DDPG作为示范RL算法,可以证明DESCA是既定ES方法的一个简单而有效的算法扩展。