Motivated by the application of real-time pricing in e-commerce platforms, we consider the problem of revenue-maximization in a setting where the seller can leverage contextual information describing the customer's history and the product's type to predict her valuation of the product. However, her true valuation is unobservable to the seller, only binary outcome in the form of success-failure of a transaction is observed. Unlike in usual contextual bandit settings, the optimal price/arm given a covariate in our setting is sensitive to the detailed characteristics of the residual uncertainty distribution. We develop a semi-parametric model in which the residual distribution is non-parametric and provide the first algorithm which learns both regression parameters and residual distribution with $\tilde O(\sqrt{n})$ regret. We empirically test a scalable implementation of our algorithm and observe good performance.
翻译:在电子商务平台应用实时定价的推动下,我们考虑了收入最大化问题,在这种环境下,卖方可以利用描述客户历史和产品类型的背景信息来预测其产品的价值,然而,卖方看不到她的真实估值,只看到交易成功失败的二元结果。与通常的背景土匪环境不同,我们环境中的共差所给出的最佳价格/武器对剩余不确定性分布的详细特点十分敏感。我们开发了半参数模型,剩余分布是非参数性的,提供了第一个算法,既学习回归参数,又学习以美元(sqrt{n})为遗憾的剩余分布。我们从经验上测试了我们算法的可伸缩性,并观察了良好的表现。