ACE: 多剂合作学习和双向行动依赖 (ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency) - 专知论文

会员服务 ·

0

Agent · Processing（编程语言） · Learning · 可辨认的 · Extensibility ·

2022 年 12 月 2 日

ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency

翻译：ACE: 多剂合作学习和双向行动依赖

Chuming Li,Jie Liu,Yinmin Zhang,Yuhong Wei,Yazhe Niu,Yaodong Yang,Yu Liu,Wanli Ouyang

from arxiv, Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI2023)

Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem, which is the ever-changing targets at every iteration when multiple agents update their policies at the same time. Starting from first principle, in this paper, we manage to solve the non-stationarity problem by proposing bidirectional action-dependent Q-learning (ACE). Central to the development of ACE is the sequential decision-making process wherein only one agent is allowed to take action at one time. Within this process, each agent maximizes its value function given the actions taken by the preceding agents at the inference stage. In the learning phase, each agent minimizes the TD error that is dependent on how the subsequent agents have reacted to their chosen action. Given the design of bidirectional dependency, ACE effectively turns a multiagent MDP into a single-agent MDP. We implement the ACE framework by identifying the proper network representation to formulate the action dependency, so that the sequential decision process is computed implicitly in one forward pass. To validate ACE, we compare it with strong baselines on two MARL benchmarks. Empirical experiments demonstrate that ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge by a large margin. In particular, on SMAC tasks, ACE achieves 100% success rate on almost all the hard and super-hard maps. We further study extensive research problems regarding ACE, including extension, generalization, and practicability. Code is made available to facilitate further research.

翻译：多剂加固学习(MARL)受到非常态问题的影响,这是在多个代理商同时更新其政策时每个迭代周期中不断变化的目标。从第一原则开始,我们在本文件中通过提出双向行动依赖的Q学习(ACE)来解决非常态问题。对于ACE的发展来说,关键的是顺序决策过程,其中只允许一个代理商同时采取行动。在这一过程中,由于前面代理商在推断阶段采取的行动,每个代理商最大限度地发挥其价值功能。在学习阶段,每个代理商最大限度地减少TD错误,这取决于随后的代理商如何对其所选择的行动作出反应。鉴于双向依赖性设计,ACE将多剂MDP有效地转化为单一代理商的MDP。我们实施ACE框架,方法是确定适当的网络代表来制定行动依赖性,从而在一次前暗中计算顺序决策程序。为了验证ACE,我们将其与两个MARL基准的强基线进行了比较。关于ACEEO-CE的磁性分析实验显示,包括CEFIFLA和MLAFI-S-restial-ressloral-ass recal exal exal exal laction acaltical ex exal ex and ex ex ex ex ex exmlationaltraceal ex exal ex

0

相关内容

Agent

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

新型Plectin-1荧光、MRI靶向分子探针对胰腺癌早期诊断的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

抑制Kupffer细胞RIP140表达诱导内毒素耐受减轻肝移植缺血再灌注损伤的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

随机多智能体系统的一致性及优化控制

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

基于β-葡聚糖受体Dectin-1的黑灵芝多糖免疫调节作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

随机多智能体系统的协调控制

国家自然科学基金

2+阅读 · 2012年12月31日

大黄鱼抗氧化酶Peroxiredoxin IV调控炎症反应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

碳排放约束下生产系统的效率评价方法与优化机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PLCE1基因及其介导的信号通路在新疆哈萨克族食管癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

CyPA在OX40-OX40L受体-配体轴调控动脉粥样斑块形成中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

Constrained Online Two-stage Stochastic Optimization: New Algorithms via Adversarial Learning

Arxiv

0+阅读 · 2023年2月2日

Stream-based active learning with linear models

Arxiv

0+阅读 · 2023年2月1日

Learning Choice Functions with Gaussian Processes

Arxiv

0+阅读 · 2023年2月1日

Learning from Stochastic Labels

Arxiv

1+阅读 · 2023年2月1日

Internally Rewarded Reinforcement Learning

Arxiv

0+阅读 · 2023年2月1日

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Arxiv

0+阅读 · 2023年2月1日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Constrained Online Two-stage Stochastic Optimization: New Algorithms via Adversarial Learning

Arxiv

0+阅读 · 2023年2月2日

Stream-based active learning with linear models

Arxiv

0+阅读 · 2023年2月1日

Learning Choice Functions with Gaussian Processes

Arxiv

0+阅读 · 2023年2月1日

Learning from Stochastic Labels

Arxiv

1+阅读 · 2023年2月1日

Internally Rewarded Reinforcement Learning

Arxiv

0+阅读 · 2023年2月1日

What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?

Arxiv

0+阅读 · 2023年2月1日

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Emergent Bartering Behaviour in Multi-Agent Reinforcement Learning

Arxiv

19+阅读 · 2022年5月13日

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Arxiv

26+阅读 · 2020年2月10日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

79+阅读 · 2020年1月19日

Hierarchical Graph Pooling with Structure Learning

Arxiv

13+阅读 · 2019年11月14日

相关基金

新型Plectin-1荧光、MRI靶向分子探针对胰腺癌早期诊断的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

抑制Kupffer细胞RIP140表达诱导内毒素耐受减轻肝移植缺血再灌注损伤的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

随机多智能体系统的一致性及优化控制

国家自然科学基金

1+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

基于β-葡聚糖受体Dectin-1的黑灵芝多糖免疫调节作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

随机多智能体系统的协调控制

国家自然科学基金

2+阅读 · 2012年12月31日

大黄鱼抗氧化酶Peroxiredoxin IV调控炎症反应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

碳排放约束下生产系统的效率评价方法与优化机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

PLCE1基因及其介导的信号通路在新疆哈萨克族食管癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

CyPA在OX40-OX40L受体-配体轴调控动脉粥样斑块形成中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员