随机通信网络中自愿和有限通信的分散式最佳反应 (Decentralized Inertial Best-Response with Voluntary and Limited Communication in Random Communication Networks)

Multiple autonomous agents interact over a random communication network to maximize their individual utility functions which depend on the actions of other agents. We consider decentralized best-response with inertia type algorithms in which agents form beliefs about the future actions of other players based on local information, and take an action that maximizes their expected utility computed with respect to these beliefs or continue to take their previous action. We show convergence of these types of algorithms to a Nash equilibrium in weakly acyclic games under the condition that the belief update and information exchange protocols successfully learn the actions of other players with positive probability in finite time given a static environment, i.e., when other agents' actions do not change. We design a decentralized fictitious play algorithm with voluntary and limited communication (DFP-VL) protocols that satisfy this condition. In the voluntary communication protocol, each agent decides whom to exchange information with by assessing the novelty of its information and the potential effect of its information on others' assessments of their utility functions. The limited communication protocol entails agents sending only their most frequent action to agents that they decide to communicate with. Numerical experiments on a target assignment game demonstrate that the voluntary and limited communication protocol can more than halve the number of communication attempts while retaining the same convergence rate as DFP in which agents constantly attempt to communicate.

翻译：我们考虑对惯性类型算法作出分散式的最佳反应,即代理人根据当地信息对其他行为者的未来行动形成信念,并采取行动最大限度地提高对这些信仰的预期效用,或继续采取以前的行动。我们把这些类型的算法与纳什平衡的趋同在微弱的单循环游戏中显示,条件是信仰更新和信息交流协议能够在一个静止的环境中,在固定的环境中,即当其他代理人的行动没有改变时,在有限的时间内成功地了解其他行为者的积极可能性,即其他代理人的行动没有改变时,在有限的时间里,对惯性型算法作出最分散式的最佳反应,使代理人根据当地信息,对其他行为者的未来行动形成信念形成信念,从而形成对其他行为者未来行动的信念;我们考虑对惯性且有限的通信协议(DP-VL)协议设计一种分散式的虚拟游戏算法,从而满足这一条件。在自愿通信协议中,每个代理人决定通过评估其信息的新颖性以及信息对他人评估其功用功能的潜在影响来交流信息。在有限的通信协议中,只有向他们决定与其联系的代理人发送最频繁的行动的代理人。在目标分配游戏上进行的实验表明,在自愿和有限的通信协议中可以使FP的代理人保持同一比例。