《许多世界中最美好的世界:在线分配问题双镜源》 (The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems)

Online allocation problems with resource constraints are central problems in revenue management and online advertising. In these problems, requests arrive sequentially during a finite horizon and, for each request, a decision maker needs to choose an action that consumes a certain amount of resources and generates reward. The objective is to maximize cumulative rewards subject to a constraint on the total consumption of resources. In this paper, we consider a data-driven setting in which the reward and resource consumption of each request are generated using an input model that is unknown to the decision maker. We design a general class of algorithms that attain good performance in various input models without knowing which type of input they are facing. In particular, our algorithms are asymptotically optimal under independent and identically distributed inputs as well as various non-stationary stochastic input models, and they attain an asymptotically optimal fixed competitive ratio when the input is adversarial. Our algorithms operate in the Lagrangian dual space: they maintain a dual multiplier for each resource that is updated using online mirror descent. By choosing the reference function accordingly, we recover the dual sub-gradient descent and dual multiplicative weights update algorithm. The resulting algorithms are simple, fast, and do not require convexity in the revenue function, consumption function and action space, in contrast to existing methods for online allocation problems. We discuss applications to network revenue management, online bidding in repeated auctions with budget constraints, online proportional matching with high entropy, and personalized assortment optimization with limited inventory.

翻译：资源限制的在线分配问题是收入管理和在线广告方面的中心问题。在这些问题上,请求是按顺序在一定的视野内收到的,对于每项请求,决策者需要选择一种消耗一定数量的资源并产生奖赏的行动。目标是在资源总消费受限制的情况下,最大限度地累积奖励。在本文件中,我们考虑一种数据驱动环境,在这一环境中,每项请求的奖赏和资源消耗是使用决策者所不知道的输入模式生成的。我们设计了一种一般的算法,在各种投入模式中取得良好业绩,而不知道它们所面临的投入类型。特别是,我们的算法在独立和相同分配的投入以及各种非固定性抽查投入模式下,是尽可能最佳的。在投入处于对抗状态时,我们考虑的是利用一个输入模型来生成每项请求的奖赏和资源消耗量和资源消耗量。我们通过选择参考函数,因此,我们回收了双重的次位和双倍倍倍的多倍数的递增的排序。我们的算法,在独立和相同分配的分布下,在在线预算分配中,对等的比力功能是快速的比对等的,结果,对等的对等的计算,对等的计算,对等的对等的计算,对等的对等的对等的对等的计算,对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对算,对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对等的对