In machine learning or statistics, it is often desirable to reduce the dimensionality of a sample of data points in a high dimensional space $\mathbb{R}^d$. This paper introduces a dimensionality reduction method where the embedding coordinates are the eigenvectors of a positive semi-definite kernel obtained as the solution of an infinite dimensional analogue of a semi-definite program. This embedding is adaptive and non-linear. A main feature of our approach is the existence of a non-linear out-of-sample extension formula of the embedding coordinates, called a projected Nystr\"om approximation. This extrapolation formula yields an extension of the kernel matrix to a data-dependent Mercer kernel function. Our empirical results indicate that this embedding method is more robust with respect to the influence of outliers, compared with a spectral embedding method.
翻译:在机器学习或统计中,通常可取的做法是减少高维空间数据点样本的维度 $\mathbb{R ⁇ d$。本文介绍了一个维度减少法,其中嵌入坐标是正半确定性内核的精子,这是作为半确定性程序无限维象的解决方案获得的。这种嵌入是适应性和非线性。我们方法的一个主要特征是存在嵌入坐标的非线外标外扩展公式,称为预测 Nystr\'om 近似值。这一外推公式将内核矩阵延伸至依赖数据的 Mercer内核函数。我们的经验结果表明,与光谱嵌入方法相比,这种嵌入方法在外线的影响方面更为有力。