The statistical and computational performance of sparse principal component analysis (PCA) can be dramatically improved when the principal components are allowed to be sparse in a rotated eigenbasis. For this, we propose a new method for sparse PCA. In the simplest version of the algorithm, the component scores and loadings are initialized with a low-rank singular value decomposition. Then, the singular vectors are rotated with orthogonal rotations to make them approximately sparse. Finally, soft-thresholding is applied to the rotated singular vectors. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. Our sparse PCA framework is versatile; for example, it extends naturally to the two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We identify the close relationship between sparse PCA and independent component analysis for separating sparse signals. We provide empirical evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications---sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of Twitter accounts, we demonstrate the usefulness of sparse PCA in exploring modern multivariate data.
翻译:当允许主要组成部分在旋转的单质基质中稀散时,稀少主元组成部分分析(PCA)的统计和计算性能可以大为改善。 为此,我们为稀散的五氯苯甲醚提出一种新的方法。 在最简单的算法版本中, 组件分数和装载的初始化为低级单值分解分解。 然后, 单向矢量以正态旋转旋转方式旋转, 使其大致稀散。 最后, 对旋转的单向矢量应用软高度保持方法。 这个方法与以前的方法不同, 因为它使用一种正方位旋转, 以近似稀散的基础。 我们稀散的五氯苯甲醚框架是多功能的; 例如, 它自然延伸至对数据矩阵的双向分析, 用于同时减少行和列的维度。 我们确定稀散的五氯苯甲醚和独立部件分析之间的密切关系, 以区分稀散的信号。 我们提供经验证据, 表明对于同样的宽度, 拟议的稀散的五氯苯甲醚方法比较稳定, 并且能够解释与替代的方法相比更多的差异。 通过三种应用的图像的分解编码, 分析, 稀散的调的图像分析, 微质的调制的调制数据, 数据组合中我们展示了我们的数据。