具有锁定控制示范的样本有效多机构强化学习 (Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control) - 专知论文

会员服务 ·

0

Learning · Agent · 控制器 · 强化学习 · Processing（编程语言） ·

2022 年 9 月 17 日

Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control

翻译：具有锁定控制示范的样本有效多机构强化学习

Yunbo Qiu,Yuzhu Zhan,Yue Jin,Jian Wang,Xudong Zhang

from arxiv, Accepted by IEEE Vehicular Technology Conference (VTC) 2022-Fall

Flocking control is a significant problem in multi-agent systems such as multi-agent unmanned aerial vehicles and multi-agent autonomous underwater vehicles, which enhances the cooperativity and safety of agents. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. However, methods based on MARL suffer from sample inefficiency, since they require a huge number of experiences to be collected from interactions between agents and the environment. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents. During the process of pretraining, agents learn policies from demonstrations by MARL and behavior cloning simultaneously, and are prevented from overfitting demonstrations. By pretraining with non-expert demonstrations, PwD-MARL improves sample efficiency in the process of online MARL with a warm start. Experiments show that PwD-MARL improves sample efficiency and policy performance in the problem of flocking control, even with bad or few demonstrations.

翻译：多剂无人驾驶飞行器和多剂自主水下飞行器等多试剂系统中的封锁控制是一个重大问题,它加强了代理人的协作和安全;与传统方法不同,多剂强化学习(MARL)更灵活地解决了羊群控制问题;但是,基于MARL的方法缺乏效率,因为需要从代理人与环境之间的互动中收集大量经验,因此,根据MARL采用的方法缺乏效率;我们提议采用新方法,为MARL(PwD-MARL)举办示范培训,这种示范可使用传统方法预先收集的非专家演示来培养代理人;在培训前阶段,代理人从MARL的演示中学习政策,同时进行行为克隆,防止过分适应示威;通过非专家演示培训,PwD-MARL提高网上MARL过程的取样效率,从一个温暖的开端开始;实验显示,PwD-MARL在羊控制问题上提高样品效率和政策性能,即使是不良或很少的演示。

0

相关内容

Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

130+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

43+阅读 · 2015年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

退化抛物方程的可控性

国家自然科学基金

0+阅读 · 2013年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

受时变对流扩散方程约束的最优控制问题的SUPG方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

具有双侧碰撞约束的多自由度碰撞振动系统的对称性、动力学行为与控制

国家自然科学基金

0+阅读 · 2009年12月31日

基于可加模糊行为的轮式机器人运动规划与控制

国家自然科学基金

0+阅读 · 2009年12月31日

基于被动多传感器的目标跟踪方法研究

国家自然科学基金

3+阅读 · 2008年12月31日

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Arxiv

0+阅读 · 2022年10月26日

Off-Policy Correction for Actor-Critic Methods without Importance Sampling

Arxiv

0+阅读 · 2022年10月24日

ADLight: A Universal Approach of Traffic Signal Control with Augmented Data Using Reinforcement Learning

Arxiv

0+阅读 · 2022年10月24日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2022年10月24日

Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments

Arxiv

0+阅读 · 2022年10月24日

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Arxiv

0+阅读 · 2022年10月20日

Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

Arxiv

0+阅读 · 2022年10月20日

MARLlib: Extending RLlib for Multi-agent Reinforcement Learning

Arxiv

0+阅读 · 2022年10月11日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

130+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

【ALT 2019 Tutorials】强化学习的探索性开发（Exploration-Exploitation in Reinforcement Learning）

专知会员服务

34+阅读 · 2019年3月21日

热门VIP内容

开通专知VIP会员享更多权益服务

《印太区域的海域态势感知》2025最新112页报告

《军事网络工具中运用生成式人工智能的伦理与对抗风险》最新报告

中文版 | AI增强型指挥控制（C2）系统：军事决策与战场情报变革

《面相高速武器冲击评估的靶区参考算法》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning

Arxiv

0+阅读 · 2022年10月26日

Off-Policy Correction for Actor-Critic Methods without Importance Sampling

Arxiv

0+阅读 · 2022年10月24日

ADLight: A Universal Approach of Traffic Signal Control with Augmented Data Using Reinforcement Learning

Arxiv

0+阅读 · 2022年10月24日

A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets

Arxiv

0+阅读 · 2022年10月24日

Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments

Arxiv

0+阅读 · 2022年10月24日

Value Function Decomposition for Iterative Design of Reinforcement Learning Agents

Arxiv

0+阅读 · 2022年10月20日

Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

Arxiv

0+阅读 · 2022年10月20日

MARLlib: Extending RLlib for Multi-agent Reinforcement Learning

Arxiv

0+阅读 · 2022年10月11日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

相关基金

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

43+阅读 · 2015年12月31日

最优控制的快速算法

国家自然科学基金

0+阅读 · 2014年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

退化抛物方程的可控性

国家自然科学基金

0+阅读 · 2013年12月31日

向量优化问题的近似解的最优性条件

国家自然科学基金

0+阅读 · 2012年12月31日

受时变对流扩散方程约束的最优控制问题的SUPG方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

具有双侧碰撞约束的多自由度碰撞振动系统的对称性、动力学行为与控制

国家自然科学基金

0+阅读 · 2009年12月31日

基于可加模糊行为的轮式机器人运动规划与控制

国家自然科学基金

0+阅读 · 2009年12月31日

基于被动多传感器的目标跟踪方法研究

国家自然科学基金

3+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员