实践中强化学习:机遇与挑战 (Reinforcement Learning in Practice: Opportunities and Challenges)

This article is a gentle discussion about the field of reinforcement learning in practice, about opportunities and challenges, touching a broad range of topics, with perspectives and without technical details. The article is based on both historical and recent research papers, surveys, tutorials, talks, blogs, books, (panel) discussions, and workshops/conferences. Various groups of readers, like researchers, engineers, students, managers, investors, officers, and people wanting to know more about the field, may find the article interesting. In this article, we first give a brief introduction to reinforcement learning (RL), and its relationship with deep learning, machine learning and AI. Then we discuss opportunities of RL, in particular, products and services, games, bandits, recommender systems, robotics, transportation, finance and economics, healthcare, education, combinatorial optimization, computer systems, and science and engineering. Then we discuss challenges, in particular, 1) foundation, 2) representation, 3) reward, 4) exploration, 5) model, simulation, planning, and benchmarks, 6) off-policy/offline learning, 7) learning to learn a.k.a. meta-learning, 8) explainability and interpretability, 9) constraints, 10) software development and deployment, 11) business perspectives, and 12) more challenges. We conclude with a discussion, attempting to answer: "Why has RL not been widely adopted in practice yet?" and "When is RL helpful?".

翻译：文章以历史和最近的研究论文、调查、辅导、演讲、博客、书籍、(小组)讨论和讲习班/会议为基础。各种读者群体,例如研究人员、工程师、学生、管理人员、投资者、官员和希望了解更多实地情况的人,可能会发现文章有趣。在本篇文章中,我们首先简要地介绍强化学习(RL)及其与深层学习、机器学习和AI的关系。然后我们讨论产品和服务的机会,特别是:游戏、土匪、推荐系统、机器人、运输、金融和经济系统、保健、教育、组合优化、计算机系统、科学和工程。然后我们讨论挑战,特别是:(1)基础,(2)代表性,(3)奖励,(4)勘探,(5)模型,模拟,规划和基准,(6)非政策/脱节学习,(7)学习a.k.a.元学习,8) 定义和解释性更强的软件限制,“我们从11角度理解和解释性 ”,我们从一个更加及时的软件和解释性的角度,从10 结束10 “我们从一个更难的角度看,从一个更难理解和解释性的观点,从一个 " 理解一个更难理解和解释性的挑战 " 。