利用变革器加强学习,促进可解释的时时逻辑动力规划 (Exploiting Transformer in Reinforcement Learning for Interpretable Temporal Logic Motion Planning)

Automaton based approaches have enabled robots to perform various complex tasks. However, most existing automaton based algorithms highly rely on the manually customized representation of states for the considered task, limiting its applicability in deep reinforcement learning algorithms. To address this issue, by incorporating Transformer into reinforcement learning, we develop a Double-Transformer-guided Temporal Logic framework (T2TL) that exploits the structural feature of Transformer twice, i.e., first encoding the LTL instruction via the Transformer module for efficient understanding of task instructions during the training and then encoding the context variable via the Transformer again for improved task performance. Particularly, the LTL instruction is specified by co-safe LTL. As a semantics-preserving rewriting operation, LTL progression is exploited to decompose the complex task into learnable sub-goals, which not only converts non-Markovian reward decision process to Markovian ones, but also improves the sampling efficiency by simultaneous learning of multiple sub-tasks. An environment-agnostic LTL pre-training scheme is further incorporated to facilitate the learning of the Transformer module resulting in improved representation of LTL. The simulation and experiment results demonstrate the effectiveness of the T2TL framework.

翻译：以自动马顿为基础的方法使机器人能够完成各种复杂的任务。然而,大多数基于自动马顿的算法都高度依赖各国手工定制的、用于深强化学习算法,限制了其在深层强化学习算法中的适用性。为了解决这个问题,我们通过将变异器纳入强化学习,开发了双向导制时空逻辑框架(T2TL),利用变异器的结构特征两次,即首先通过变异器模块对LTL教学进行编码,以便在培训期间有效理解任务指示,然后通过变异器对上下文变量进行编码,以改进任务性能。特别是,LTL指令由共同安全 LTL具体指定。作为保留重写功能的语义操作,LTL进步被利用,将复杂任务分解成可学习的子目标,不仅将非马尔科维安奖励决策过程转换为Markovian目标,而且还通过同时学习多个子任务来提高采样效率。一个环境-无能变变异LT预培训计划被进一步纳入到共同安全LTLT。作为保留重写操作的语系,从而学习变换LT框架的模拟结果。