通过最佳运输方式衡量随机矢量之间依赖性 (Measuring dependence between random vectors via optimal transport)

To quantify the dependence between two random vectors of possibly different dimensions, we propose to rely on the properties of the 2-Wasserstein distance. We first propose two coefficients that are based on the Wasserstein distance between the actual distribution and a reference distribution with independent components. The coefficients are normalized to take values between 0 and 1, where 1 represents the maximal amount of dependence possible given the two multivariate margins. We then make a quasi-Gaussian assumption that yields two additional coefficients rooted in the same ideas as the first two. These different coefficients are more amenable for distributional results and admit attractive formulas in terms of the joint covariance or correlation matrix. Furthermore, maximal dependence is proved to occur at the covariance matrix with minimal von Neumann entropy given the covariance matrices of the two multivariate margins. This result also helps us revisit the RV coefficient by proposing a sharper normalisation. The two coefficients based on the quasi-Gaussian approach can be estimated easily via the empirical covariance matrix. The estimators are asymptotically normal and their asymptotic variances are explicit functions of the covariance matrix, which can thus be estimated consistently too. The results extend to the Gaussian copula case, in which case the estimators are rank-based. The results are illustrated through theoretical examples, Monte Carlo simulations, and a case study involving electroencephalography data.

翻译：为了量化两个可能具有不同维度的随机矢量之间的依赖性,我们建议依赖2-Wasserstein距离的特性。我们首先建议基于实际分布和独立组成部分的参考分布之间的瓦瑟斯坦距离的两种系数。系数的正常化是为了取0和1之间的值,其中1是两个多变差中可能的最大依赖度。然后,我们假设一个准加西文的假设,产生与前两个概念相同的另外两个系数。这些不同的系数更适合分配结果,并接受联合共变或相关矩阵中具有吸引力的公式。此外,最大依赖性被证明发生在共变矩阵中,同时使用最小的von Neumann 变差中两个多变差的共变差矩阵。这也有助于我们通过提出更精确的常态化来重新审视RV系数。基于准加西文方法的两个系数可以通过实验性理论变异性矩阵来轻易估算。估计的计算公式在联合共变差或相关矩阵中具有吸引力的公式。此外,最大依赖性在共变差矩阵矩阵中出现最小的共变数,因此,矩阵的结果是直判。