政策渐进方法的政策-知识示范学习 (Policy-Aware Model Learning for Policy Gradient Methods)

This paper considers the problem of learning a model in model-based reinforcement learning (MBRL). We examine how the planning module of an MBRL algorithm uses the model, and propose that the model learning module should incorporate the way the planner is going to use the model. This is in contrast to conventional model learning approaches, such as those based on maximum likelihood estimate, that learn a predictive model of the environment without explicitly considering the interaction of the model and the planner. We focus on policy gradient type of planning algorithms and derive new loss functions for model learning that incorporate how the planner uses the model. We call this approach Policy-Aware Model Learning (PAML). We theoretically analyze a generic model-based policy gradient algorithm and provide a convergence guarantee for the optimized policy. We also empirically evaluate PAML on some benchmark problems, showing promising results.

翻译：本文探讨了学习基于模型的强化学习模式(MBRL)的问题。我们研究了MBRL算法的规划模块如何使用该模型,并提议示范学习模块应当纳入计划者使用模型的方式。这与传统的示范学习方法形成对照,例如基于最大可能性估计的模型,在不明确考虑模型和计划者相互作用的情况下学习环境预测模型。我们侧重于规划算法的政策梯度类型,并为纳入规划者如何使用模型的模型学习得出新的损失功能。我们称之为“政策-软件模型学习 ” (PAML) 。我们理论上分析了基于通用模型的政策梯度算法,并为优化政策提供了趋同保证。我们还从经验上评估了某些基准问题,显示了有希望的结果。

相关内容

MoDELS

关注 30

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

83+阅读 · 2021年1月12日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

127+阅读 · 2020年5月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

123+阅读 · 2020年4月19日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

64+阅读 · 2020年3月28日