In the Learning to Price setting, a seller posts prices over time with the goal of maximizing revenue while learning the buyer's valuation. This problem is very well understood when values are stationary (fixed or iid). Here we study the problem where the buyer's value is a moving target, i.e., they change over time either by a stochastic process or adversarially with bounded variation. In either case, we provide matching upper and lower bounds on the optimal revenue loss. Since the target is moving, any information learned soon becomes out-dated, which forces the algorithms to keep switching between exploring and exploiting phases.
翻译:在 " 学习价格 " 设定中,卖主在一段时间内公布价格,目的是在了解买主的估价的同时实现收入最大化。当价值是固定的(固定的或iid的)时,这个问题就非常清楚了。这里我们研究的是买方的价值是一个移动目标的问题,即它们随时间而变化,要么是通过随机过程变化,要么与受约束的差异发生对抗。在这两种情况下,我们在最佳收入损失的上限和下限上进行匹配。由于目标正在移动,任何获得的信息很快就会过时,这就迫使算法在探索阶段和开发阶段之间不断转换。