Feature selection has evolved to be an important step in several machine learning paradigms. In domains like bio-informatics and text classification which involve data of high dimensions, feature selection can help in drastically reducing the feature space. In cases where it is difficult or infeasible to obtain sufficient number of training examples, feature selection helps overcome the curse of dimensionality which in turn helps improve performance of the classification algorithm. The focus of our research here are five embedded feature selection methods which use either the ridge regression, or Lasso regression, or a combination of the two in the regularization part of the optimization function. We evaluate five chosen methods on five large dimensional datasets and compare them on the parameters of sparsity and correlation in the datasets and their execution times.
翻译:地物选择是若干机器学习范式的一个重要步骤。 在涉及高维数据的生物信息学和文本分类领域, 地物选择有助于大幅缩小地物空间。 在难以或无法获得足够数量的培训实例的情况下, 地物选择有助于克服维度的诅咒, 这反过来又有助于提高分类算法的性能。 我们的研究重点是五种嵌入地物选择方法, 要么使用山脊回归, 要么是Lasso回归, 或者将两者结合到优化功能的正规化部分。 我们评估了五种大型数据集的五种选择方法, 并比较了数据集及其执行时间的宽度参数和相关性。