The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.
翻译:Sliced-Wasserstein 距离(SW)正越来越多地用于机器学习应用,作为瓦塞斯坦距离的替代物,并提供重要的计算和统计效益。由于SW被定义为对随机预测的预期,SW通常被蒙特卡洛所近似。我们采用一种新的视角,利用测量现象的集中性来接近SW:在轻度假设下,对高维随机矢量的一维预测大约是高斯文。基于这一观察,我们为SW开发了一个简单的确定性近似。我们的方法不需要抽样抽查一些随机预测,因此与通常的Monte Carlo近似相比,我们的方法既准确又容易使用。我们为我们的方法获取了非假设性的保证,并表明在数据分布上依赖性较弱的条件下,近似误差会上升到零。我们验证了我们关于合成数据集的理论结论,并展示了关于基因化模型问题的拟议近似值。