高数据非参数特征选择总框架 (A General Framework of Nonparametric Feature Selection in High-Dimensional Data)

Nonparametric feature selection in high-dimensional data is an important and challenging problem in statistics and machine learning fields. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. In this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel which depends on a set of parameters that determine the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters at the same time. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove both the oracle selection property and the Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and application to a microarray study of eye disease in animals.

翻译：在高维数据中,非参数选择是统计和机器学习领域的一个重要和具有挑战性的问题。现有特征选择方法大多侧重于可能因模型偏差而受到影响的参数或添加模型。在本文中,我们提出一个新的框架,对回归和分类问题进行非参数选择。在这个框架内,我们通过在复制的内核Hilbert空间中最大限度地减少实验风险来学习预测功能。空间是由新型的电压产品内核产生的,该内核取决于确定特征重要性的一组参数。比较而言,我们尽量减少实验风险,同时对预测和内核参数进行估计。解决办法可以通过迭接解决锥形优化问题获得。我们研究内核空间的理论属性,并证明我们拟议方法的外壳属性和渔民一致性。最后,我们通过广泛的模拟研究和应用对动物眼病的微粒子研究,展示我们的方法与现有方法相比的优异性表现。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日