瓦塞斯坦分布强力估算器统计分析 (Statistical Analysis of Wasserstein Distributionally Robust Estimators)

We consider statistical methods which invoke a min-max distributionally robust formulation to extract good out-of-sample performance in data-driven optimization and learning problems. Acknowledging the distributional uncertainty in learning from limited samples, the min-max formulations introduce an adversarial inner player to explore unseen covariate data. The resulting Distributionally Robust Optimization (DRO) formulations, which include Wasserstein DRO formulations (our main focus), are specified using optimal transportation phenomena. Upon describing how these infinite-dimensional min-max problems can be approached via a finite-dimensional dual reformulation, the tutorial moves into its main component, namely, explaining a generic recipe for optimally selecting the size of the adversary's budget. This is achieved by studying the limit behavior of an optimal transport projection formulation arising from an inquiry on the smallest confidence region that includes the unknown population risk minimizer. Incidentally, this systematic prescription coincides with those in specific examples in high-dimensional statistics and results in error bounds that are free from the curse of dimensions. Equipped with this prescription, we present a central limit theorem for the DRO estimator and provide a recipe for constructing compatible confidence regions that are useful for uncertainty quantification. The rest of the tutorial is devoted to insights into the nature of the optimizers selected by the min-max formulations and additional applications of optimal transport projections.

翻译：我们考虑使用微量分配强度配方的统计方法,采用微量分配强度配方,在数据驱动的优化和学习问题中提取出优度的外观性能。认识到从有限样本中学习的分布性不确定性,微量最大配方引入了一种对称的内部内玩家来探索无形的共变数据。由此产生的分布性强优化(DRO)配方,包括Wasserstein DRO的配方(我们的主要重点),是使用最佳运输现象来说明的。在描述这些无限的微量微量问题如何通过有限维度双重重整,即向其主要组成部分的调整,即解释最佳选择对手预算规模的通用配方。这是通过对最小信任区域(包括未知的人口风险最小化区域)的调查而得出的最佳运输预测的极限行为。这个系统处方与高度统计数据中的具体例子相吻合,其结果是,在不局限于各种维度的界限上出现错误。根据这一处方,我们提出了一个核心限制,即解释如何以最佳方式选择对手预算的规模。通过可比较的精确度预测,为最优化的精确度配置提供了一种配方。