强有力的主要构成部分分析:中中手段方法 (Robust Principal Component Analysis: A Median of Means Approach)

Principal Component Analysis (PCA) is a fundamental tool for data visualization, denoising, and dimensionality reduction. It is widely popular in Statistics, Machine Learning, Computer Vision, and related fields. However, PCA is well known to fall prey to the presence of outliers and often fails to detect the true underlying low-dimensional structure within the dataset. Recent supervised learning methods, following the Median of Means (MoM) philosophy, have shown great success in dealing with outlying observations without much compromise to their large sample theoretical properties. In this paper, we propose a PCA procedure based on the MoM principle. Called the Median of Means Principal Component Analysis (MoMPCA), the proposed method is not only computationally appealing but also achieves optimal convergence rates under minimal assumptions. In particular, we explore the non-asymptotic error bounds of the obtained solution via the aid of Vapnik-Chervonenkis theory and Rademacher complexity, while granting absolutely no assumption on the outlying observations. The efficacy of the proposal is also thoroughly showcased through simulations and real data applications.

翻译：主要组成部分分析(PCA)是数据可视化、脱色和减少维度的基本工具,在统计、机器学习、计算机视野及相关领域广为人知,但常设仲裁院众所周知会成为外部线的牺牲品,往往无法在数据集中发现真正的低维结构,最近根据手段中位理论,在处理外围观测时,在对其大样本理论特性没有多大妥协的情况下,在处理外向观测方面表现出了极大的成功。在本文中,我们提议了基于MOM原则的常设仲裁院程序。我们称之为“主要组成部分分析手段介质”(MEMCA),提议的方法不仅在计算上具有吸引力,而且在最低假设下也实现了最佳的趋同率。特别是,我们探索了通过Vapnik-Chervonenkis理论和Rademacher 复杂度帮助获得的解决方案的非被动错误界限,同时对外向观测绝对不作任何假设。提案的效力还通过模拟和真实数据应用来彻底展示。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日