Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well known to fall prey to the presence of outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Recent supervised learning methods, following the Median of Means (MoM) philosophy, have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. In this paper, we propose a PCA procedure based on the MoM principle. Called the Median of Means Principal Component Analysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of Vapnik-Chervonenkis theory and Rademacher complexity, while granting absolutely no assumption on the outlying observations. The efficacy of the proposal is also thoroughly showcased through simulations and real data applications.
翻译:主要组成部分分析(PCA)是数据可视化、脱色和减少维度的基本工具,在统计、机器学习、计算机视野及相关领域广为人知,但常设仲裁院众所周知会成为外部线的牺牲品,往往无法在数据集中发现真正的低维结构,最近根据手段中位理论,在处理外围观测时,在对其大样本理论特性没有多大妥协的情况下,在处理外向观测方面表现出了极大的成功。在本文中,我们提议了基于MOM原则的常设仲裁院程序。我们称之为“主要组成部分分析手段介质”(MEMCA),提议的方法不仅在计算上具有吸引力,而且在最低假设下也实现了最佳的趋同率。特别是,我们探索了通过Vapnik-Chervonenkis理论和Rademacher 复杂度帮助获得的解决方案的非被动错误界限,同时对外向观测绝对不作任何假设。提案的效力还通过模拟和真实数据应用来彻底展示。