连续控制,从示范活动中量化行动 (Continuous Control with Action Quantization from Demonstrations)

In this paper, we propose a novel Reinforcement Learning (RL) framework for problems with continuous action spaces: Action Quantization from Demonstrations (AQuaDem). The proposed approach consists in learning a discretization of continuous action spaces from human demonstrations. This discretization returns a set of plausible actions (in light of the demonstrations) for each input state, thus capturing the priors of the demonstrator and their multimodal behavior. By discretizing the action space, any discrete action deep RL technique can be readily applied to the continuous control problem. Experiments show that the proposed approach outperforms state-of-the-art methods such as SAC in the RL setup, and GAIL in the Imitation Learning setup. We provide a website with interactive videos: https://google-research.github.io/aquadem/ and make the code available: https://github.com/google-research/google-research/tree/master/aquadem.

翻译：在本文中,我们针对连续行动空间的问题提出了一个新的强化学习框架:从演示中量化行动(AQuaDem),拟议的方法包括学习从人类演示中分离出连续行动空间。这种分离为每个输入状态提供了一套可信的行动(根据演示),从而捕捉了演示人的前科及其多式行为。通过将行动空间分解,任何离散行动深度RL技术都可以随时适用于连续控制问题。实验显示,拟议的方法优于最先进的方法,如RL设置中的SAC和模拟学习设置中的GAIL。我们提供一个网站,提供互动视频:https://goagle-research.github.io/quadem/,并提供代码:https://github.com/gogle-resear/goolle-research/tree/master/quadem。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日