Eigenvalue Sparass主要成分分析(ESPCA)的原始成分分析(EESPCA) (Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA))

We present a novel technique for sparse principal component analysis. This method, named Eigenvectors from Eigenvalues Sparse Principal Component Analysis (EESPCA), is based on the formula for computing squared eigenvector loadings of a Hermitian matrix from the eigenvalues of the full matrix and associated sub-matrices. We explore two versions of the EESPCA method: a version that uses a fixed threshold for inducing sparsity and a version that selects the threshold via cross-validation. Relative to the state-of-the-art sparse PCA methods of Witten et al., Yuan & Zhang and Tan et al., the fixed threshold EESPCA technique offers an order-of-magnitude improvement in computational speed, does not require estimation of tuning parameters via cross-validation, and can more accurately identify true zero principal component loadings across a range of data matrix sizes and covariance structures. Importantly, the EESPCA method achieves these benefits while maintaining out-of-sample reconstruction error and PC estimation error close to the lowest error generated by all evaluated approaches. EESPCA is a practical and effective technique for sparse PCA with particular relevance to computationally demanding statistical problems such as the analysis of high-dimensional data sets or application of statistical techniques like resampling that involve the repeated calculation of sparse PCs.

翻译：我们展示了一种稀疏主元件分析的新技术。这个方法的名称是Eigenvalues Sprassy主元件分析(ESPCA)中的Eigenvictors,其依据是计算全矩阵和相关次矩阵的等值的Hermitian矩阵的正方位成份负荷公式。我们探索了ESPCA方法的两个版本:一种是使用固定阈值诱导散度的版本,一种是通过交叉校验选择门槛值的版本。相对于Witten et al.、Yuan & Zhang和Tan et al.等最先进稀疏的CPA方法, 固定阈值EESCA技术在计算速度上提供了一种质量级定序改进,不需要通过交叉校验来估算调参数,更准确地确定在一系列数据矩阵大小和变量结构中的真正零主要组件装载值。相对于Witten and al.、Yuan & Zang and T. et al., 固定门槛值 EESSCA 技术在计算方法中具有最低实际相关性,因此,通过对静态的统计-CA技术的反复计算方法进行了评估。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

【经典书】图理论与应用，270页pdf

专知会员服务

86+阅读 · 2020年12月5日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

《常微分方程》笔记，419页pdf

专知会员服务

76+阅读 · 2020年8月2日