塔姆价格波动:利用强盗反馈优化单声道斯托克电流优化 (Taming Wild Price Fluctuations: Monotone Stochastic Convex Optimization with Bandit Feedback)

Prices generated by automated price experimentation algorithms often display wild fluctuations, leading to unfavorable customer perceptions and violations of individual fairness: e.g., the price seen by a customer can be significantly higher than what was seen by her predecessors, only to fall once again later. To address this concern, we propose demand learning under a monotonicity constraint on the sequence of prices, within the framework of stochastic convex optimization with bandit feedback. Our main contribution is the design of the first sublinear-regret algorithms for monotonic price experimentation for smooth and strongly concave revenue functions under noisy as well as noiseless bandit feedback. The monotonicity constraint presents a unique challenge: since any increase (or decrease) in the decision-levels is final, an algorithm needs to be cautious in its exploration to avoid over-shooting the optimum. At the same time, minimizing regret requires that progress be made towards the optimum at a sufficient pace. Balancing these two goals is particularly challenging under noisy feedback, where obtaining sufficiently accurate gradient estimates is expensive. Our key innovation is to utilize conservative gradient estimates to adaptively tailor the degree of caution to local gradient information, being aggressive far from the optimum and being increasingly cautious as the prices approach the optimum. Importantly, we show that our algorithms guarantee the same regret rates (up to logarithmic factors) as the best achievable rates of regret without the monotonicity requirement.

翻译：自动价格实验算法所产生的价格往往表现出狂野波动,导致不受欢迎的客户看法和违反个人公平性:例如,客户所看到的价格可能大大高于其前任所看到的价格,只是以后才再次下跌。为了解决这一关切,我们提议在对价格顺序的单一性限制下,在以土匪反馈为框架的随机共鸣优化框架内,在对价格顺序进行单向性限制的情况下,以强盗反馈的方式,在单线性价格实验方面设计第一级亚线性-约束性算法,以达到平稳和强烈的固定收入功能,例如,客户所看到的价格可能大大高于其前任所看到的价格,但这种价格的单一性制约是一个独特的挑战:由于决策级别的任何增加(或减少)都是最后的,因此在探索时需要谨慎谨慎谨慎,以避免过度打破最佳价格。同时,要最大限度地减少遗憾,要求以足够的速度取得最佳的步伐取得进展。平衡这两个目标在噪音反馈下尤其具有挑战性,因为获得足够准确的梯度估计费用。我们的关键创新是利用保守的梯度估算,而不是适应性梯度的梯度估计,将我们最稳性梯度的汇率调整利率,以至最稳性的标准,以显示最佳的阶值的阶值的汇率,我们最接近于最接近于最稳度的阶值的阶值的阶值的阶值的轨道,以显示最佳的推。