相互竞争的强盗:竞争中的勘探危险 (Competing Bandits: The Perils of Exploration Under Competition)

Most online platforms strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We study the interplay between exploration and competition: how such platforms balance the exploration for learning and the competition for users. Here users play three distinct roles: they are customers that generate revenue, they are sources of data for learning, and they are self-interested agents which choose among the competing platforms. We consider a stylized duopoly model in which two firms face the same multi-armed bandit problem. Users arrive one by one and choose between the two firms, so that each firm makes progress on its bandit problem only if it is chosen. Through a mix of theoretical results and numerical simulations, we study whether and to what extent competition incentivizes the adoption of better bandit algorithms, and whether it leads to welfare increases for users. We find that stark competition induces firms to commit to a "greedy" bandit algorithm that leads to low welfare. However, weakening competition by providing firms with some "free" users incentivizes better exploration strategies and increases welfare. We investigate two channels for weakening the competition: relaxing the rationality of users and giving one firm a first-mover advantage. Our findings are closely related to the "competition vs. innovation" relationship, and elucidate the first-mover advantage in the digital economy.

翻译：多数在线平台都努力从与用户的互动中学习,许多参与探索:为获取新信息而作出潜在的不最佳选择;我们研究勘探和竞争之间的相互作用:这些平台如何平衡学习探索和用户竞争。这里用户发挥三个不同的作用:他们是创收的客户,是学习的数据来源,是自我感兴趣的代理商,在竞争的平台中作出选择。我们认为,一种典型的双赢模式,两家公司都面临同样的多武装匪帮问题。用户一个一个地到达,两个公司选择,这样每家公司就只能选择一个,这样,每个公司就只能在其强盗问题上取得进展。通过理论结果和数字模拟的组合,我们研究竞争是否和在多大程度上鼓励采用更好的强盗算算法,以及竞争是否给用户带来福利的提高。我们发现,激烈的竞争促使公司承诺采用“大调”的强盗算算法,从而导致低福利。然而,通过向一些“自由”用户提供更好的探索策略来削弱竞争,只有选择时,每个公司才能在强盗问题上取得进展。我们研究竞争是否鼓励采用更好的数字创新,提高投资者的竞争力。我们调查两个渠道是“与理性创新相关的渠道。”