示范预测剂学习动态模型 (Learning Dynamics Models for Model Predictive Agents)

Model-Based Reinforcement Learning involves learning a \textit{dynamics model} from data, and then using this model to optimise behaviour, most often with an online \textit{planner}. Much of the recent research along these lines presents a particular set of design choices, involving problem definition, model learning and planning. Given the multiple contributions, it is difficult to evaluate the effects of each. This paper sets out to disambiguate the role of different design choices for learning dynamics models, by comparing their performance to planning with a ground-truth model -- the simulator. First, we collect a rich dataset from the training sequence of a model-free agent on 5 domains of the DeepMind Control Suite. Second, we train feed-forward dynamics models in a supervised fashion, and evaluate planner performance while varying and analysing different model design choices, including ensembling, stochasticity, multi-step training and timestep size. Besides the quantitative analysis, we describe a set of qualitative findings, rules of thumb, and future research directions for planning with learned dynamics models. Videos of the results are available at https://sites.google.com/view/learning-better-models.

翻译：以模型为基础的强化学习需要从数据中学习\ textit{ 动力学模型,然后利用这一模型优化行为,通常使用在线\ textit{ planner}。最近围绕这些方针进行的许多研究展示了一套特殊的设计选择,涉及问题定义、模式学习和规划。鉴于多种贡献,很难评估每种模型的效果。本文通过比较其业绩和地面真相模型 -- -- 模拟器 -- -- 来区分不同设计选择对于学习动态模型的作用,从而将其与规划相混淆。首先,我们从深明控制套的5个领域的无型代理的培训序列中收集了丰富的数据集。第二,我们以监督的方式培训进化动态模型,评估规划者的业绩,同时对不同的模型设计选择进行不同的分析,包括组合、分析性、多步培训和时间档大小。除了定量分析外,我们还描述了一套定性发现、拇指规则以及未来研究方向,以便用学习过的动态模型进行规划。结果的录像/模型可在 https://site/commexismal.

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/