Principal Component Analysis (PCA) is a popular tool for dimensionality reduction and feature extraction in data analysis. There is a probabilistic version of PCA, known as Probabilistic PCA (PPCA). However, standard PCA and PPCA are not robust, as they are sensitive to outliers. To alleviate this problem, this paper introduces the Self-Paced Learning mechanism into PPCA, and proposes a novel method called Self-Paced Probabilistic Principal Component Analysis (SP-PPCA). Furthermore, we design the corresponding optimization algorithm based on the alternative search strategy and the expectation-maximization algorithm. SP-PPCA looks for optimal projection vectors and filters out outliers iteratively. Experiments on both synthetic problems and real-world datasets clearly demonstrate that SP-PPCA is able to reduce or eliminate the impact of outliers.
翻译:主要组成部分分析(PCA)是数据分析中减少维度和特征提取的流行工具,有一个称为概率五氯苯甲醚(PPCA)的概率版,然而,标准五氯苯甲醚和PPCA并不健全,因为它们对外部线敏感。为缓解这一问题,本文件将自制学习机制引入PPCA, 并提议一种称为自制概率主要组成部分分析(SP-PPCA)的新方法。此外,我们根据替代搜索战略和预期最大化算法设计相应的优化算法。SP-PPCA寻求最佳投影矢量和过滤器的迭接法。关于合成问题和真实世界数据集的实验清楚地表明SP-PPCA能够减少或消除外部线的影响。