通过最大限度地增加信息收益,估算美元-兰克 (Estimating $α$-Rank by Maximizing Information Gain)

Game theory has been increasingly applied in settings where the game is not known outright, but has to be estimated by sampling. For example, meta-games that arise in multi-agent evaluation can only be accessed by running a succession of expensive experiments that may involve simultaneous deployment of several agents. In this paper, we focus on $\alpha$-rank, a popular game-theoretic solution concept designed to perform well in such scenarios. We aim to estimate the $\alpha$-rank of the game using as few samples as possible. Our algorithm maximizes information gain between an epistemic belief over the $\alpha$-ranks and the observed payoff. This approach has two main benefits. First, it allows us to focus our sampling on the entries that matter the most for identifying the $\alpha$-rank. Second, the Bayesian formulation provides a facility to build in modeling assumptions by using a prior over game payoffs. We show the benefits of using information gain as compared to the confidence interval criterion of ResponseGraphUCB (Rowland et al. 2019), and provide theoretical results justifying our method.

翻译：游戏理论已越来越多地应用于游戏不完全已知,但必须用抽样来估计。例如,多试剂评价中出现的元游戏只能通过一系列昂贵的实验来获得,这些实验可能同时部署若干物剂。在本文中,我们注重于$\alpha$-rank,这是一个流行的游戏理论-理论解决方案概念,目的是在这种情景下很好地发挥作用。我们的目标是尽可能少地使用样本来估计游戏的 $\alpha$-rank。我们的算法最大限度地利用对$\alpha$-rank和观察到的收益的认知性信仰之间的信息收益。这个方法有两个主要好处。首先,它使我们能够把抽样集中在对确定$\alpha$-rank最为重要的条目上。第二,Bayesian 配方提供了一种工具,通过使用先前的游戏回报来建立模型假设。我们展示了使用信息收益的好处,与响应GraphUCB(Rowland等人等人,2019)的可信度间隔标准相比,并提供理论结果来证明我们的方法。