Non-parametric correlation coefficients have been widely used for analysing arbitrary random variables upon common populations, when requiring an explicit error distribution to be known is an unacceptable assumption. We examine an \(\ell_{2}\) representation of a correlation coefficient (Emond and Mason, 2002) from the perspective of a statistical estimator upon random variables, and verify a number of interesting and highly desirable mathematical properties, mathematically similar to the Whitney embedding of a Hilbert space into the \(\ell_{2}\)-norm space. In particular, we show here that, in comparison to the traditional Spearman (1904) \(ρ\), the proposed Kemeny \(ρ_κ\) correlation coefficient satisfies Gauss-Markov conditions in the presence or absence of ties, thereby allowing both discrete and continuous marginal random variables. We also prove under standard regularity conditions a number of desirable scenarios, including the construction of a null hypothesis distribution which is Student-t distributed, parallel to standard practice with Pearson's r, but without requiring either continuous random variables nor particular Gaussian errors. Simulations in particular focus upon highly kurtotic data, with highly nominal empirical coverage consistent with theoretical expectation.
翻译:当需要已知显式误差分布这一假设不可接受时,非参数相关系数已被广泛用于分析基于共同总体的任意随机变量。我们从随机变量上统计估计量的角度,考察了相关系数的一个ℓ₂表示(Emond和Mason,2002),并验证了若干有趣且高度理想的数学性质,其数学形式类似于希尔伯特空间到ℓ₂范数空间的惠特尼嵌入。特别地,我们在此证明:相较于传统的斯皮尔曼(1904)ρ,所提出的Kemeny ρκ相关系数在存在或不存在结值的情况下均满足高斯-马尔可夫条件,从而允许离散和连续的边缘随机变量。我们还在标准正则性条件下证明了若干理想场景,包括构建一个服从Student-t分布的零假设分布,这与皮尔逊r的标准实践并行,但既不要求连续随机变量,也不要求特定的高斯误差。模拟特别关注高度峰态的数据,其经验覆盖率高度符合名义水平,与理论预期一致。