带有不完美标签的半监督的包装包装物特征选择 (Semi-supervised Wrapper Feature Selection with Imperfect Labels)

In this paper, we propose a new wrapper approach for semi-supervised feature selection. A common strategy in semi-supervised learning is to augment the training set by pseudo-labeled unlabeled examples. However, the pseudo-labeling procedure is prone to error and has a high risk of disrupting the learning algorithm with additional noisy labeled training data. To overcome this, we propose to model explicitly the mislabeling error during the learning phase with the overall aim of selecting the most relevant feature characteristics. We derive a $\mathcal{C}$-bound for Bayes classifiers trained over partially labeled training sets by taking into account the mislabeling errors. The risk bound is then considered as an objective function that is minimized over the space of possible feature subsets using a genetic algorithm. In order to produce both sparse and accurate solution, we propose a modification of a genetic algorithm with the crossover based on feature weights and recursive elimination of irrelevant features. Empirical results on different data sets show the effectiveness of our framework compared to several state-of-the-art semi-supervised feature selection approaches.

翻译：在本文中,我们提议对半监督性特征选择采用新的包装方法。半监督性学习的共同战略是增加假标签未贴标签的例子所设定的培训。然而,伪标签程序容易出错,而且极有可能以额外噪音标签培训数据干扰学习算法。为了克服这一点,我们提议在学习阶段明确模拟错误标签错误,总体目标是选择最相关的特征。我们通过考虑错误标签错误,为在部分标签培训组合中接受培训的贝斯族分类员推出一个$mathcal{C}美元约束值。然后,将风险约束视为一个客观功能,在使用基因算法的可能的特性分类空间上最小化。为了产生稀少和准确的解决方案,我们提议修改基因算法,根据特征权重和反复消除无关特征进行交叉。关于不同数据集的预测结果显示我们框架与若干州级半监督性特征选择方法相比的有效性。

相关内容

特征选择

关注 5910

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

69+阅读 · 2020年4月20日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

40+阅读 · 2020年4月11日