合作多智能体强化学习中的集成值函数用于有效探索 (Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning) - 专知论文

会员服务 ·

0

智能体 · 多智能体强化学习 · 多智能体 · 值函数 · 强化学习 ·

2023 年 4 月 16 日

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

翻译：合作多智能体强化学习中的集成值函数用于有效探索

Lukas Schäfer,Oliver Slumbers,Stephen McAleer,Yali Du,Stefano V. Albrecht,David Mguni

from arxiv, Presented at the Adaptive and Learning Agents Workshop (ALA) at the AAMAS conference 2023

Cooperative multi-agent reinforcement learning (MARL) requires agents to explore to learn to cooperate. Existing value-based MARL algorithms commonly rely on random exploration, such as $\epsilon$-greedy, which is inefficient in discovering multi-agent cooperation. Additionally, the environment in MARL appears non-stationary to any individual agent due to the simultaneous training of other agents, leading to highly variant and thus unstable optimisation signals. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to extend any value-based MARL algorithm. EMAX trains ensembles of value functions for each agent to address the key challenges of exploration and non-stationarity: (1) The uncertainty of value estimates across the ensemble is used in a UCB policy to guide the exploration of agents to parts of the environment which require cooperation. (2) Average value estimates across the ensemble serve as target values. These targets exhibit lower variance compared to commonly applied target networks and we show that they lead to more stable gradients during the optimisation. We instantiate three value-based MARL algorithms with EMAX, independent DQN, VDN and QMIX, and evaluate them in 21 tasks across four environments. Using ensembles of five value functions, EMAX improves sample efficiency and final evaluation returns of these algorithms by 53%, 36%, and 498%, respectively, averaged all 21 tasks.

翻译：合作多智能体强化学习需要智能体探索以学习合作。现有基于值的多智能体强化学习算法通常依赖于随机探索，如 $\epsilon$-贪心，这在发现多智能体合作方面效率低下。此外，由于其他智能体的同时训练，多智能体强化学习环境对任何单个智能体来说都是非静态的，导致优化信号高度变化从而不稳定。在本文中，我们提出了集成值函数用于多智能体探索（EMAX），它是扩展任何基于值的多智能体强化学习算法的通用框架。EMAX 为每个智能体训练一组值函数来解决探索和非静态性的关键挑战：（1）利用集合中价值估计的不确定性，使用UCB策略来指导智能体探索需要合作的环境部分。（2）展示了集合中的平均值估计作为目标值。这些目标相比常用的目标网络表现出更低的方差，并且我们证明它们在优化过程中导致更稳定的梯度。我们使用EMAX对三种基于值的多智能体强化学习算法进行实例化，包括独立DQN、VDN和QMIX，并在四种环境的21个任务中进行了评估。在所有21个任务上平均，使用五个值函数的集合，EMAX将这些算法的样本效率和最终评估回报分别提高了53％，36％和498％。

0

相关内容

智能体

智能体，顾名思义，就是具有智能的实体，英文名是Agent。

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

专知会员服务

231+阅读 · 2022年4月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

专知会员服务

25+阅读 · 2022年3月16日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

【NeurIPS 2021】设置多智能体策略梯度的方差

【NeurIPS 2021】设置多智能体策略梯度的方差

专知会员服务

21+阅读 · 2021年10月24日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

去中心化多智能体导航的基于模型的强化学习 (RL)

去中心化多智能体导航的基于模型的强化学习 (RL)

TensorFlow

13+阅读 · 2021年6月24日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

专知

77+阅读 · 2018年8月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Multi-Agent的企业动态联盟合作中信任问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

非线性系统优化控制的数值解法统一框架及滑模后退时域控制算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于一致性理论的多无人机协同控制和决策方法

国家自然科学基金

5+阅读 · 2012年12月31日

汽车复杂约束下的多目标集成控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

创新扩散的营销策略优化：复杂网络仿真与实证检验

国家自然科学基金

0+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2023年6月2日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Arxiv

0+阅读 · 2023年6月1日

Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年5月31日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

44+阅读 · 2022年8月2日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

VIP会员

文章信息

相关主题

多智能体强化学习

相关VIP内容

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

【AI+军事】美国HRL实验室AAAI2020《基于强化学习的多智能体任务规划》，Multi-Agent Mission Planning with Reinforcement Learning

专知会员服务

231+阅读 · 2022年4月10日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

专知会员服务

25+阅读 · 2022年3月16日

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

【AAAI 2022】一种样本高效的基于模型的保守 actor-critic 算法

专知会员服务

24+阅读 · 2022年1月10日

【NeurIPS 2021】设置多智能体策略梯度的方差

【NeurIPS 2021】设置多智能体策略梯度的方差

专知会员服务

21+阅读 · 2021年10月24日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《商用大语言模型的升级风险管理：国家安全运用》

【伯克利博士论文】通过真实世界实践赋能机器人自主性

《从装备到文化：美陆军技术素养建设启示录》最新报告

人工智能安全治理白皮书（2025）

相关资讯

去中心化多智能体导航的基于模型的强化学习 (RL)

去中心化多智能体导航的基于模型的强化学习 (RL)

TensorFlow

13+阅读 · 2021年6月24日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

【下载】深度强化学习实战书籍和代码《Deep Reinforcement Learning in Action》

专知

77+阅读 · 2018年8月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2023年6月2日

Non-stationary Reinforcement Learning under General Function Approximation

Arxiv

0+阅读 · 2023年6月1日

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

Arxiv

0+阅读 · 2023年6月1日

Achieving Fairness in Multi-Agent Markov Decision Processes Using Reinforcement Learning

Arxiv

0+阅读 · 2023年6月1日

Efficient Online Reinforcement Learning with Offline Data

Arxiv

0+阅读 · 2023年5月31日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

44+阅读 · 2022年8月2日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

DeepPath: A Reinforcement Learning Method for Knowledge Graph Reasoning

Arxiv

20+阅读 · 2018年1月8日

相关基金

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

连续时间马氏决策过程均值-方差优化问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Multi-Agent的企业动态联盟合作中信任问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

非线性系统优化控制的数值解法统一框架及滑模后退时域控制算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于一致性理论的多无人机协同控制和决策方法

国家自然科学基金

5+阅读 · 2012年12月31日

汽车复杂约束下的多目标集成控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

创新扩散的营销策略优化：复杂网络仿真与实证检验

国家自然科学基金

0+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

基于支持向量机的复杂连续系统强化学习控制研究

国家自然科学基金

11+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员