支持在Sparse 五氯苯甲醚中恢复无兰地失踪数据 (Support Recovery in Sparse PCA with Non-Random Missing Data)

We analyze a practical algorithm for sparse PCA on incomplete and noisy data under a general non-random sampling scheme. The algorithm is based on a semidefinite relaxation of the $\ell_1$-regularized PCA problem. We provide theoretical justification that under certain conditions, we can recover the support of the sparse leading eigenvector with high probability by obtaining a unique solution. The conditions involve the spectral gap between the largest and second-largest eigenvalues of the true data matrix, the magnitude of the noise, and the structural properties of the observed entries. The concepts of algebraic connectivity and irregularity are used to describe the structural properties of the observed entries. We empirically justify our theorem with synthetic and real data analysis. We also show that our algorithm outperforms several other sparse PCA approaches especially when the observed entries have good structural properties. As a by-product of our analysis, we provide two theorems to handle a deterministic sampling scheme, which can be applied to other matrix-related problems.

翻译：我们根据一般的非随机抽样办法,分析关于不完整和噪音数据的零散五氯苯甲醚的实用算法;该算法基于对美元1美元正规化五氯苯甲醚问题的半无限期放松;我们提供理论理由,说明在某些条件下,我们可以通过获得独特的解决办法,以很高的概率恢复稀疏主要五氯苯甲醚的支持;这些条件涉及真实数据矩阵的最大值和第二大值之间的光谱差距,噪音的大小,以及观察到条目的结构特性。代数连接和异常概念被用来描述所观察到条目的结构特性。我们用合成和真实的数据分析,从经验上为我们的标本提供理由。我们还表明,我们的算法超越了其他几个稀散五氯苯甲醚方法,特别是当所观察到的条目具有良好的结构特性时。作为我们分析的副产品,我们提供了两个用于处理确定性取样方法的理论,这些方法可以适用于其他与矩阵有关的问题。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。