In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of proto-value functions (PVFs) at accurately approximating the value function in low dimensions and we highlight the importance of features learning for an improved low-dimensional value function approximation. Then, we adopt different representation learning algorithm on graphs to learn the basis functions that best represent the value function. We empirically show that node2vec, an algorithm for scalable feature learning in networks, and the Variational Graph Auto-Encoder constantly outperform the commonly used smooth proto-value functions in low-dimensional feature space.
翻译:在这项工作中,我们通过通用版的演示政策迭代(RPI),研究高维状态或行动空间的强化学习(RL)问题中的值值函数近似值。我们考虑了原生价值函数(PVFs)在准确接近低维值函数方面的局限性,我们强调了学习特征对于改进低维值函数近似值的重要性。然后,我们在图表中采用了不同的代表学习算法,以学习最能代表价值函数的基础函数。我们的经验显示,节点2vec是网络中可缩放特征学习的算法,Variational Grap Auto-Encoder不断超越低维特征空间中常用的光滑的原始价值函数。