We consider strategic settings where several users engage in a repeated online interaction, assisted by regret-minimizing learning agents that repeatedly play a "game" on their behalf. We study the dynamics and average outcomes of the repeated game of the agents, and propose to view it as inducing a "meta-game" between the users. Our main focus is on whether users can benefit in this meta-game from "manipulating" their own agents by misreporting their parameters to them. We formally define the model of these meta-games between the users for general games and analyze the equilibria induced on the users in two classes of games in which the time-average of all regret-minimizing dynamics converge to a single equilibrium.
翻译:我们考虑一些战略环境,让几个用户在为他们反复玩“游戏”的“游戏”的遗憾最小化学习代理器的协助下,反复进行在线互动。我们研究了代理器反复游戏的动态和平均结果,并提议将它视为引发用户之间的“游戏 ” 。我们的主要焦点是用户能否从向用户错误报告参数从而“操纵”自己的代理器的这一元游戏中受益。我们正式定义了普通游戏用户之间的这些元游戏模式,并分析了在两种游戏中用户产生的平衡,在这两种游戏中,所有遗憾最小化动态的平均时间会聚集到一个单一的平衡中。