Reinforcement-based learning has attracted considerable attention both in modeling human behavior as well as in engineering, for designing measurement- or payoff-based optimization schemes. Such learning schemes exhibit several advantages, especially in relation to filtering out noisy observations. However, they may exhibit several limitations when applied in a distributed setup. In multi-player weakly-acyclic games, and when each player applies an independent copy of the learning dynamics, convergence to (usually desirable) pure Nash equilibria cannot be guaranteed. Prior work has only focused on a small class of games, namely potential and coordination games. To address this main limitation, this paper introduces a novel payoff-based learning scheme for distributed optimization, namely aspiration-based perturbed learning automata (APLA). In this class of dynamics, and contrary to standard reinforcement-based learning schemes, each player's probability distribution for selecting actions is reinforced both by repeated selection and an aspiration factor that captures the player's satisfaction level. We provide a stochastic stability analysis of APLA in multi-player positive-utility games under the presence of noisy observations. This is the first part of the paper that characterizes stochastic stability in generic non-zero-sum games by establishing equivalence of the induced infinite-dimensional Markov chain with a finite dimensional one. In the second part, stochastic stability is further specialized to weakly acyclic games.
翻译:基于强化的学习在模拟人类行为以及工程领域中设计基于测量或收益的优化方案方面引起了广泛关注。此类学习方案展现出多项优势,尤其在过滤噪声观测方面。然而,在分布式设置中应用时,它们可能表现出若干局限性。在多参与者弱非循环博弈中,当每个参与者应用独立的学习动态副本时,无法保证收敛至(通常理想的)纯纳什均衡。先前研究仅聚焦于少数博弈类别,即势博弈与协调博弈。为应对这一主要局限,本文提出了一种新颖的分布式优化收益驱动学习方案——基于期望的扰动学习自动机(APLA)。在此类动态中,与标准强化学习方案不同,每个参与者选择行动的概率分布同时受到重复选择与反映参与者满意度的期望因子双重强化。我们在存在噪声观测的条件下,对多参与者正效用博弈中的APLA进行了随机稳定性分析。本文第一部分通过建立诱导无限维马尔可夫链与有限维链的等价性,刻画了通用非零和博弈中的随机稳定性特征。第二部分将进一步针对弱非循环博弈深化随机稳定性的分析。