While the problem of testing multivariate normality has received considerable attention in the classical low-dimensional setting where the sample size $n$ is much larger than the feature dimension $d$ of the data, there is presently a dearth of existing tests which are valid in the high-dimensional setting where $d$ is of comparable or larger order than $n$. This paper studies the hypothesis testing problem of determining whether $n$ i.i.d. samples are generated from a $d$-dimensional multivariate normal distribution, in settings where $d$ grows with $n$ at some rate under a broad regime. To this end, we propose a new class of computationally efficient tests which can be regarded as a high-dimensional adaptation of the classical radial approach to testing normality. A key member of this class is a range-type test which, under a very general rate of growth of $d$ with respect to $n$, is proven to achieve both type I error-control and consistency for three important classes of alternatives; namely, finite mixture model, non-Gaussian elliptical, and leptokurtic alternatives. Extensive simulation studies demonstrate the superiority of our test compared to existing methods, and two gene expression applications demonstrate the effectiveness of our procedure for detecting violations of multivariate normality which are of potentially practical significance.
翻译:尽管多元正态性检验问题在经典低维设置(样本量$n$远大于数据特征维度$d$)中已受到广泛关注,但目前仍缺乏在$d$与$n$相当或更大的高维设置下有效的检验方法。本文研究以下假设检验问题:在$d$以某种速率随$n$增长的广泛机制下,判断$n$个独立同分布样本是否来自$d$维多元正态分布。为此,我们提出了一类新的计算高效检验方法,可视为经典径向正态性检验方法的高维适应性扩展。该类方法中的关键成员是一种极差型检验,在$d$相对于$n$的非常一般的增长速率下,被证明能够同时实现对第一类错误的控制,并对三类重要备择假设(即有限混合模型、非高斯椭圆分布以及尖峰态分布)保持检验功效。大量模拟研究证明了本检验相较于现有方法的优越性,两个基因表达应用案例则展示了本方法在检测具有潜在实际意义的多元正态性偏离方面的有效性。