In this paper, we consider an online resource allocation problem where a decision maker accepts or rejects incoming customer requests irrevocably in order to maximize expected reward given limited resources. At each time, a new order/customer/bid is revealed with a request of some resource(s) and a reward. We consider a stochastic setting where all the orders are i.i.d. sampled from an unknown distribution. Such formulation arises from many classic applications such as the canonical (quantity-based) network revenue management problem and the Adwords problem. While the literature on the topic mainly focuses on regret minimization, our paper considers the \textit{fairness} aspect of the problem. On a high level, we define the fairness in a way that a fair online algorithm should treat similar agents/customers similarly, and the decision made for similar agents/customers should be consistent over time. To achieve this goal, we define the fair offline solution as the analytic center of the offline optimal solution set, and introduce \textit{cumulative unfairness} as the cumulative deviation from the online solutions to the fair offline solution over time. We propose a fair algorithm based on an interior-point LP solver and a mechanism that dynamically detects unfair resource spending. Our algorithm achieves cumulative unfairness on the scale of order $O(\log(T))$, while maintains the regret to be bounded without dependency on $T$. In addition, compared to the literature, our result is produced under less restrictive assumptions on the degeneracy of the underlying linear program.
翻译:在本文中,我们考虑一个在线资源分配问题,即决策者接受或拒绝来港客户的要求,在资源有限的情况下不可撤销地接受或拒绝,以最大限度地实现预期的回报。在每次披露新的订单/客户/投标时,都会提出一些资源要求和奖赏。我们考虑的是所有订单都来自未知分布的抽样的随机设置。这种配置来自许多经典应用程序,例如,Canonial(数量基)网络收入管理问题和口号问题。虽然关于该主题的文献主要侧重于尽量减少遗憾,但我们的文件考虑了这一问题的\textit{公平}方面。在高层次上,我们定义公平在线算法应当以同样的方式对待类似的代理商/客户,而对于类似代理商/客户所作的决定应当随着时间的推移一致。为了实现这一目标,我们定义公平离线解决方案是离线最佳解决方案的解析中心(基于数量基), 并且引入\ textnal 不公平的不公平, 作为在线解决办法的累积偏离了在线解决方案, 而我们提出的一个基于公平的离线内部算法的递增成本程序。