平均-外地多机构加强多机构学习:分散化网络办法 (Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network Approach)

One of the challenges for multi-agent reinforcement learning (MARL) is designing efficient learning algorithms for a large system in which each agent has only limited or partial information of the entire system. In this system, it is desirable to learn policies of a decentralized type. A recent and promising paradigm to analyze such decentralized MARL is to take network structures into consideration. While exciting progress has been made to analyze decentralized MARL with the network of agents, often found in social networks and team video games, little is known theoretically for decentralized MARL with the network of states, frequently used for modeling self-driving vehicles, ride-sharing, and data and traffic routing. This paper proposes a framework called localized training and decentralized execution to study MARL with network of states, with homogeneous (a.k.a. mean-field type) agents. Localized training means that agents only need to collect local information in their neighboring states during the training phase; decentralized execution implies that, after the training stage, agents can execute the learned decentralized policies, which only requires knowledge of the agents' current states. The key idea is to utilize the homogeneity of agents and regroup them according to their states, thus the formulation of a networked Markov decision process with teams of agents, enabling the update of the Q-function in a localized fashion. In order to design an efficient and scalable reinforcement learning algorithm under such a framework, we adopt the actor-critic approach with over-parameterized neural networks, and establish the convergence and sample complexity for our algorithm, shown to be scalable with respect to the size of both agents and states.

翻译：多试剂强化学习(MARL)的挑战之一是为大型系统设计有效的学习算法,其中每个代理商只有有限或部分的全系统信息。在这个系统中,最好学习分散型的政策。一个最近和有希望的模式是分析这种分散型MARL的网络结构。虽然在分析分散型MARL与代理商网络(通常见于社交网络和团队视频游戏)之间的分散型MARL方面取得了令人兴奋的进展,但在理论上很少知道与州网络的分散型MARL,经常用于模拟自行驾驶车辆、搭乘、数据和交通流转的样本。本文提议了一个称为本地化培训和分散型执行的框架,以便与州网络(a.k.a.平均型)一道研究MARL的分散型政策。本地化培训意味着代理商只需在培训阶段后收集其邻国的本地信息;分散型执行意味着代理商可以执行学到的分散型政策,而这只需要对代理人当前状态的了解。关键的想法是利用代理商的同质化和分散型集成型化型的网络来研究(a.kallicalal),因此,在升级型的网络中将它们的设计程序与升级为升级式的系统,从而确定一个升级为升级型的系统,从而升级的系统,在升级的系统,从而调整一个升级式的系统,从而调整一个升级式的系统,在升级式的系统,在升级到升级的系统。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日