一项因果强化学习调查 (A Survey on Causal Reinforcement Learning)

While Reinforcement Learning (RL) achieves tremendous success in sequential decision-making problems of many domains, it still faces key challenges of data inefficiency and the lack of interpretability. Interestingly, many researchers have leveraged insights from the causality literature recently, bringing forth flourishing works to unify the merits of causality and address well the challenges from RL. As such, it is of great necessity and significance to collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL methods, and investigate the potential functionality from causality toward RL. In particular, we divide existing CRL approaches into two categories according to whether their causality-based information is given in advance or not. We further analyze each category in terms of the formalization of different models, ranging from the Markov Decision Process (MDP), Partially Observed Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment Regime (DTR). Moreover, we summarize the evaluation matrices and open sources while we discuss emerging applications, along with promising prospects for the future development of CRL.

翻译：虽然加强学习在许多领域的连续决策问题上取得了巨大成功,但它仍然面临着数据效率低下和缺乏解释性等关键挑战。有趣的是,许多研究人员最近利用了因果关系文献的洞察力,带来了丰富多彩的作品,以统一因果关系的优点,并很好地应对RL的挑战。因此,整理这些因果强化学习的作品、审查CRL方法并调查从因果关系到RL的潜在功能,是极为必要和重要的。特别是,我们根据是否预先提供了基于因果关系的信息,将现有的CRL方法分为两类。我们进一步从不同模式的正规化角度分析了每一类,从Markov决定程序、部分维持的Markov决定程序、多臂宽(MOMDP)和动态治疗制度(DTR)等,我们总结了评价矩阵和开放来源,同时讨论了新出现的应用,以及CRL的未来发展前景。