反调反调训训训训 (Inverse Constrained Reinforcement Learning)

In real world settings, numerous constraints are present which are hard to specify mathematically. However, for the real world deployment of reinforcement learning (RL), it is critical that RL agents are aware of these constraints, so that they can act safely. In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior. We experimentally validate our approach and show that our framework can successfully learn the most likely constraints that the agent respects. We further show that these learned constraints are \textit{transferable} to new agents that may have different morphologies and/or reward functions. Previous works in this regard have either mainly been restricted to tabular (discrete) settings, specific types of constraints or assume the environment's transition dynamics. In contrast, our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a completely model-free setting. The code can be found it: \url{https://github.com/shehryar-malik/icrl}.

翻译：在现实世界中,存在许多难以用数学来说明的制约因素。然而,对于世界上实际部署的强化学习(RL)来说,重要的是RL代理机构了解这些制约因素,以便他们能够安全地采取行动。在这项工作中,我们考虑到从约束性抑制剂行为的示范中学习制约因素的问题。我们实验地验证了我们的方法,并表明我们的框架能够成功地了解该代理机构所尊重的最可能的限制。我们进一步表明,这些学到的制约因素是\ textit{可转让}给可能具有不同形态和/或奖励功能的新代理机构。这方面的以往工作主要局限于表(discrete)设置、具体类型的制约或假设环境的过渡动态。相反,我们的框架能够在完全没有模型的环境中学习高剂量限制的任意性\ textit{Markovian}。代码可以找到:https://github.com/shehhryar-malik/icrl}。