The Wasserstein distance is a distance between two probability distributions and has recently gained increasing popularity in statistics and machine learning, owing to its attractive properties. One important approach to extending this distance is using low-dimensional projections of distributions to avoid a high computational cost and the curse of dimensionality in empirical estimation, such as the sliced Wasserstein or max-sliced Wasserstein distances. Despite their practical success in machine learning tasks, the availability of statistical inferences for projection-based Wasserstein distances is limited owing to the lack of distributional limit results. In this paper, we consider distances defined by integrating or maximizing Wasserstein distances between low-dimensional projections of two probability distributions. Then we derive limit distributions regarding these distances when the two distributions are supported on finite points. We also propose a bootstrap procedure to estimate quantiles of limit distributions from data. This facilitates asymptotically exact interval estimation and hypothesis testing for these distances. Our theoretical results are based on the arguments of Sommerfeld and Munk (2018) for deriving distributional limits regarding the original Wasserstein distance on finite spaces and the theory of sensitivity analysis in nonlinear programming. Finally, we conduct numerical experiments to illustrate the theoretical results and demonstrate the applicability of our inferential methods to real data analysis.
翻译:瓦瑟斯坦距离是两种概率分布之间的距离,最近由于具有吸引力的特性,在统计和机器学习中越来越受欢迎。延长这一距离的一个重要办法是使用低维的分布预测,以避免高计算成本和实验估计中维维度的诅咒,例如切片瓦瑟斯坦距离或最大偏差瓦瑟斯坦距离。尽管在机器学习任务中取得了实际成功,但用于基于投影的瓦瑟斯坦距离的统计推论的可用性有限,因为缺乏分布限制的结果。在本文件中,我们认为,通过整合或最大限度地扩大两个概率分布的低维度预测之间的瓦瑟斯坦距离来界定距离。然后,当两个分布在有限点上得到支持时,我们得出关于这些距离的分布的限度分布。我们还提出一个陷阱程序来估计数据限制分布的四分量。这有利于在机器学习任务中取得精确的间隔估计和这些距离的假设测试。我们的理论结果基于索默费尔德和蒙克(2018年)的论点,用以计算出关于最小空间的瓦瑟斯坦距离的原始瓦瑟斯坦距离的分布限制或最大距离。然后,我们从有限的空间和数据敏感度分析的理论性分析中展示了我们数据的理论。