This paper studies the Random Utility Model (RUM) in a repeated stochastic choice situation, in which the decision maker is imperfectly informed about the payoffs of each available alternative. We develop a gradient-based learning algorithm by embedding the RUM into an online decision problem. We show that a large class of RUMs are Hannan consistent (\citet{Hahn1957}); that is, the average difference between the expected payoffs generated by a RUM and that of the best-fixed policy in hindsight goes to zero as the number of periods increase. In addition, we show that our gradient-based algorithm is equivalent to the Follow the Regularized Leader (FTRL) algorithm, which is widely used in the machine learning literature to model learning in repeated stochastic choice problems. Thus, we provide an economically grounded optimization framework to the FTRL algorithm. Finally, we apply our framework to study recency bias, no-regret learning in normal form games, and prediction markets.
翻译:本文在一个反复的随机选择情况下研究随机实用模型(RUM ), 决策者对每种可供选择的回报不完全知情。 我们开发了一种基于梯度的学习算法, 将RUM嵌入一个在线决定问题 。 我们显示一大批RUM是汉南一致的(\citet{Hahn1957} ) ; 也就是说, 由RUM 产生的预期收益与后视中最佳固定政策的预期收益之间的平均差异, 随着周期的增加, 将达到零 。 此外, 我们显示, 我们的梯度算法相当于“ 循规化导师( FTRL) ” 算法, 在机器学习文献中广泛使用该算法来模拟反复的随机选择问题中的学习。 因此, 我们为 FTRL 算法提供了一个基于经济的优化框架 。 最后, 我们应用我们的框架来研究耐性偏差, 在正常形式游戏和预测市场中进行无差异的学习 。