This paper addresses the problem of inverse covariance (also known as precision matrix) estimation in high-dimensional settings. Specifically, we focus on two classes of estimators: linear shrinkage estimators with a target proportional to the identity matrix, and estimators derived from data augmentation (DA). Here, DA refers to the common practice of enriching a dataset with artificial samples--typically generated via a generative model or through random transformations of the original data--prior to model fitting. For both classes of estimators, we derive estimators and provide concentration bounds for their quadratic error. This allows for both method comparison and hyperparameter tuning, such as selecting the optimal proportion of artificial samples. On the technical side, our analysis relies on tools from random matrix theory. We introduce a novel deterministic equivalent for generalized resolvent matrices, accommodating dependent samples with specific structure. We support our theoretical results with numerical experiments.
翻译:本文研究高维环境下逆协方差矩阵(亦称精度矩阵)的估计问题。具体而言,我们聚焦于两类估计器:目标矩阵与单位矩阵成比例的线性收缩估计器,以及基于数据增强(DA)衍生的估计器。此处数据增强指在模型拟合前,通过生成模型或对原始数据进行随机变换生成人工样本以扩充数据集的常见做法。针对这两类估计器,我们推导出估计量并给出其二次误差的集中界。这为方法比较和超参数调优(例如选择人工样本的最优比例)提供了依据。在技术层面,我们的分析依赖于随机矩阵理论工具。我们针对具有特定结构的相依样本,提出了一种新颖的广义解析矩阵确定性等价形式,并通过数值实验验证了理论结果。