最佳预计差异群-分隔区块 (Optimal Projected Variance Group-Sparse Block PCA)

We address the problem of defining a group sparse formulation for Principal Components Analysis (PCA) - or its equivalent formulations as Low Rank approximation or Dictionary Learning problems - which achieves a compromise between maximizing the variance explained by the components and promoting sparsity of the loadings. So we propose first a new definition of the variance explained by non necessarily orthogonal components, which is optimal in some aspect and compatible with the principal components situation. Then we use a specific regularization of this variance by the group-$\ell_{1}$ norm to define a Group Sparse Maximum Variance (GSMV) formulation of PCA. The GSMV formulation achieves our objective by construction, and has the nice property that the inner non smooth optimization problem can be solved analytically, thus reducing GSMV to the maximization of a smooth and convex function under unit norm and orthogonality constraints, which generalizes Journee et al. (2010) to group sparsity. Numerical comparison with deflation on synthetic data shows that GSMV produces steadily slightly better and more robust results for the retrieval of hidden sparse structures, and is about three times faster on these examples. Application to real data shows the interest of group sparsity for variables selection in PCA of mixed data (categorical/numerical) .

翻译：我们首先提出因主构件分析(PCA)的组稀少配方,或其等效配方,即低端近似值或词典学习问题,从而在最大程度消除各构件解释的差异和促进装载的宽度之间达成妥协;因此,我们首先提出由非必然正向成分解释的差异新定义,在某些方面是最佳的,与主要构件情况相符;然后,我们采用按1美元-ell ⁇ 1美元标准对这一差异进行具体规范化,以界定五氯苯甲醚的组散最大差异(GSMV)配方。 GSMV的配方通过构建实现了我们的目标,并具有良好的属性,即以分析方式解决内部非平稳优化问题,从而将GSMV降低到在单位规范或孔度制约下将光滑和锥体功能最大化,这一般地将Journee等人和他人(2010年)与主要构件情况相匹配;与合成数据通缩的比表明,GMVPMV为隐藏的稀疏漏结构的调取结果越来越好,而且对于这些样品来说要快三倍。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日