Feature selection is a key step in many tabular prediction problems, where multiple candidate variables may be redundant, noisy, or weakly informative. We investigate feature selection based on Kolmogorov-Arnold Networks (KANs), which parameterize feature transformations with splines and expose per-feature importance scores in a natural way. From this idea we derive four KAN-based selection criteria (coefficient norms, gradient-based saliency, and knockout scores) and compare them with standard methods such as LASSO, Random Forest feature importance, Mutual Information, and SVM-RFE on a suite of real and synthetic classification and regression datasets. Using average F1 and $R^2$ scores across three feature-retention levels (20%, 40%, 60%), we find that KAN-based selectors are generally competitive with, and sometimes superior to, classical baselines. In classification, KAN criteria often match or exceed existing methods on multi-class tasks by removing redundant features and capturing nonlinear interactions. In regression, KAN-based scores provide robust performance on noisy and heterogeneous datasets, closely tracking strong ensemble predictors; we also observe characteristic failure modes, such as overly aggressive pruning with an $\ell_1$ criterion. Stability and redundancy analyses further show that KAN-based selectors yield reproducible feature subsets across folds while avoiding unnecessary correlation inflation, ensuring reliable and non-redundant variable selection. Overall, our findings demonstrate that KAN-based feature selection provides a powerful and interpretable alternative to traditional methods, capable of uncovering nonlinear and multivariate feature relevance beyond sparsity or impurity-based measures.
翻译:特征选择是许多表格预测问题中的关键步骤,其中多个候选变量可能存在冗余、噪声或信息量较弱的问题。本研究基于Kolmogorov-Arnold网络(KANs)探究特征选择方法,该网络通过样条参数化特征变换,并以自然方式呈现各特征的重要性评分。基于此思路,我们推导出四种KAN特征选择准则(系数范数、基于梯度的显著性、剔除评分),并在真实与合成的分类及回归数据集上,与LASSO、随机森林特征重要性、互信息、SVM-RFE等标准方法进行系统比较。通过三种特征保留比例(20%、40%、60%)下的平均F1分数与$R^2$分数评估,发现基于KAN的选择器总体上与传统基线方法竞争力相当,部分场景表现更优。在分类任务中,KAN准则通过剔除冗余特征并捕捉非线性交互作用,在多分类任务中常达到或超越现有方法。在回归任务中,基于KAN的评分在噪声和异质数据集上表现稳健,与强集成预测器的性能轨迹高度吻合;同时观察到特定失效模式,例如采用$\ell_1$准则时可能产生过度剪枝。稳定性与冗余性分析进一步表明,基于KAN的选择器能在交叉验证中生成可复现的特征子集,同时避免不必要的相关性膨胀,确保选择过程可靠且非冗余。总体而言,本研究证明基于KAN的特征选择为传统方法提供了强大且可解释的替代方案,能够超越基于稀疏性或杂质度量的局限,揭示非线性及多变量特征关联性。