The optimal transport (OT) map is a geometry-driven transformation between high-dimensional probability distributions which underpins a wide range of tasks in statistics, applied probability, and machine learning. However, existing statistical theory for OT map estimation is quite restricted, hinging on Brenier's theorem (quadratic cost, absolutely continuous source) to guarantee existence and uniqueness of a deterministic OT map, on which various additional regularity assumptions are imposed to obtain quantitative error bounds. In many real-world problems these conditions fail or cannot be certified, in which case optimal transportation is possible only via stochastic maps that can split mass. To broaden the scope of map estimation theory to such settings, this work introduces a novel metric for evaluating the transportation quality of stochastic maps. Under this metric, we develop computationally efficient map estimators with near-optimal finite-sample risk bounds, subject to easy-to-verify minimal assumptions. Our analysis further accommodates common forms of adversarial sample contamination, yielding estimators with robust estimation guarantees. Empirical experiments are provided which validate our theory and demonstrate the utility of the proposed framework in settings where existing theory fails. These contributions constitute the first general-purpose theory for map estimation, compatible with a wide spectrum of real-world applications where optimal transport may be intrinsically stochastic.
翻译:最优传输(OT)映射是一种基于几何驱动的高维概率分布间变换方法,支撑着统计学、应用概率论和机器学习中的广泛任务。然而,现有关于OT映射估计的统计理论相当受限,其依赖于Brenier定理(二次成本、绝对连续源分布)以保证确定性OT映射的存在性和唯一性,并在此基础上施加各种额外的正则性假设以获得定量误差界。在许多现实问题中,这些条件无法满足或无法被验证,此时最优传输只能通过能够分割质量的随机映射实现。为将映射估计理论拓展至此类场景,本研究引入了一种评估随机映射传输质量的新度量。在此度量下,我们开发了计算高效的映射估计器,其在易于验证的最小假设下具有近乎最优的有限样本风险界。我们的分析进一步兼容了常见的对抗性样本污染形式,从而提供了具有稳健估计保证的估计器。实验验证了我们的理论,并展示了所提框架在现有理论失效场景中的实用性。这些贡献构成了首个通用映射估计理论,适用于最优传输可能本质上是随机的广泛现实应用领域。