政策选择和最佳武器识别:关于“政策选择实验中的适应性待遇分配”的评论 (Policy Choice and Best Arm Identification: Comments on "Adaptive Treatment Assignment in Experiments for Policy Choice")

Adaptive experimental design for efficient decision-making is an important problem in economics. The purpose of this paper is to connect the "policy choice" problem, proposed in Kasy and Sautmann (2021) as an instance of adaptive experimental design, to the frontiers of the bandit literature in machine learning. We discuss how the policy choice problem can be framed in a way such that it is identical to what is called the "best arm identification" (BAI) problem. By connecting the literature, we identify that the asymptotic optimality of policy choice algorithms tackled in Kasy and Sautmann (2021) is a long-standing open question in the literature. While Kasy and Sautmann (2021) presents an interesting and important empirical study, unfortunately, this connection highlights several major issues with the theoretical results. In particular, we show that Theorem 1 in Kasy and Sautmann (2021) is false. We find that the proofs of statements (1) and (2) of Theorem 1 are incorrect. Although the statements themselves may be true, they are non-trivial to fix. Statement (3), and its proof, on the other hand, is false, which we show by utilizing existing theoretical results in the bandit literature. As this question is critically important, garnering much interest in the last decade within the bandit community, we provide a review of recent developments in the BAI literature. We hope this serves to highlight the relevance to economic problems and stimulate methodological and theoretical developments in the econometric community.

翻译：高效决策的适应性实验性实验设计是经济学中的一个重要问题。本文件的目的是将Kasy和Sautmann(2021年)提出的作为适应性实验设计实例的“政策选择”问题(2021年)作为Kasy和Sautmann(2021年)提出的“政策选择”问题与机器学习中的土匪文学前沿联系起来。我们讨论了政策选择问题如何以与所谓的“最佳武器识别”(BAI)问题相同的方式提出。通过将文献联系起来,我们发现在Kasy和Sautmann(2021年)中处理的政策选择算法的非乐观性最佳性是文献中长期存在的未决问题。虽然Kasy和Sautmann(2021年)作为适应性实验性实验性设计的实例(2021年)提出了一个有趣的重要经验性研究,但不幸的是,这一联系突出了理论性文献中的若干主要问题。我们用现有的理论性研究结果来证明,十年来,“理论性动态”的理论性发展是错误的。我们用一个重要的理论性因素来证明,在十年里,我们利用现有的理论性研究成果来展示了生态学界内部的活力。