Asymptotic methods for hypothesis testing in high-dimensional data usually require the dimension of the observations to increase to infinity, often with an additional relationship between the dimension (say, $p$) and the sample size (say, $n$). On the other hand, multivariate asymptotic testing methods are valid for fixed dimension only and their implementations typically require the sample size to be large compared to the dimension to yield desirable results. In practical scenarios, it is usually not possible to determine whether the dimension of the data conform to the conditions required for the validity of the high-dimensional asymptotic methods for hypothesis testing, or whether the sample size is large enough compared to the dimension of the data. In this work, we first describe the notion of uniform-over-$p$ convergences and subsequently, develop a uniform-over-dimension central limit theorem. An asymptotic test for the two-sample equality of locations is developed, which now holds uniformly over the dimension of the observations. Using simulated and real data, it is demonstrated that the proposed test exhibits better performance compared to several popular tests in the literature for high-dimensional data as well as the usual scaled two-sample tests for multivariate data, including the Hotelling's $T^2$ test for multivariate Gaussian data.
翻译:高维数据假设检验的渐近方法通常要求观测维度趋于无穷大,且常需满足维度(记为$p$)与样本量(记为$n$)间的特定关系。另一方面,多元渐近检验方法仅适用于固定维度情形,其实际应用通常要求样本量远大于维度才能获得理想结果。在实际场景中,往往难以判定数据维度是否符合高维渐近假设检验方法的有效性条件,亦无法确认样本量是否相对于数据维度足够大。本研究首先阐述均匀维度收敛的概念,进而建立均匀维度中心极限定理。在此基础上,我们提出一种适用于两样本位置相等性检验的渐近方法,该方法在观测维度上具有均匀有效性。通过模拟数据与真实数据的实验验证,所提出的检验方法相较于文献中多种主流高维数据检验方法,以及针对多元数据的常规尺度化两样本检验(包括多元高斯数据下的Hotelling's $T^2$检验),均表现出更优的性能。