以法治为基础的持续互动强化学习 (Persistent Rule-based Interactive Reinforcement Learning)

Interactive reinforcement learning has allowed speeding up the learning process in autonomous agents by including a human trainer providing extra information to the agent in real-time. Current interactive reinforcement learning research has been limited to real-time interactions that offer relevant user advice to the current state only. Additionally, the information provided by each interaction is not retained and instead discarded by the agent after a single-use. In this work, we propose a persistent rule-based interactive reinforcement learning approach, i.e., a method for retaining and reusing provided knowledge, allowing trainers to give general advice relevant to more than just the current state. Our experimental results show persistent advice substantially improves the performance of the agent while reducing the number of interactions required for the trainer. Moreover, rule-based advice shows similar performance impact as state-based advice, but with a substantially reduced interaction count.

翻译：互动强化学习使自主代理机构的学习进程加快,包括了一名实时向代理机构提供额外信息的人类培训员。当前互动强化学习研究仅限于实时互动,仅向当前状态提供相关用户咨询。此外,每次互动提供的信息都不保留,而是在一次性使用后被代理机构丢弃。在这项工作中,我们建议采用基于规则的持续互动强化学习方法,即一种保存和重新使用所提供知识的方法,使培训员能够提供与当前状况更相关的一般建议。我们的实验结果表明,持续的建议大大改善了代理机构的业绩,同时减少了培训者所需的互动次数。此外,基于规则的建议显示,业绩影响类似于基于国家的建议,但互动次数大大减少。