Nonparametric feature selection in high-dimensional data is an important and challenging problem in statistics and machine learning fields. Most of the existing methods for feature selection focus on parametric or additive models which may suffer from model misspecification. In this paper, we propose a new framework to perform nonparametric feature selection for both regression and classification problems. In this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space. The space is generated by a novel tensor product kernel which depends on a set of parameters that determine the importance of the features. Computationally, we minimize the empirical risk with a penalty to estimate the prediction and kernel parameters at the same time. The solution can be obtained by iteratively solving convex optimization problems. We study the theoretical property of the kernel feature space and prove both the oracle selection property and the Fisher consistency of our proposed method. Finally, we demonstrate the superior performance of our approach compared to existing methods via extensive simulation studies and application to a microarray study of eye disease in animals.
翻译:在高维数据中,非参数选择是统计和机器学习领域的一个重要和具有挑战性的问题。现有特征选择方法大多侧重于可能因模型偏差而受到影响的参数或添加模型。在本文中,我们提出一个新的框架,对回归和分类问题进行非参数选择。在这个框架内,我们通过在复制的内核Hilbert空间中最大限度地减少实验风险来学习预测功能。空间是由新型的电压产品内核产生的,该内核取决于确定特征重要性的一组参数。比较而言,我们尽量减少实验风险,同时对预测和内核参数进行估计。解决办法可以通过迭接解决锥形优化问题获得。我们研究内核空间的理论属性,并证明我们拟议方法的外壳属性和渔民一致性。最后,我们通过广泛的模拟研究和应用对动物眼病的微粒子研究,展示我们的方法与现有方法相比的优异性表现。