以随机预测方式进行分类的零现性最佳性和最低复杂性 (Asymptotic optimality and minimal complexity of classification by random projection)

The generalization error of a classifier is related to the complexity of the set of functions among which the classifier is chosen. Roughly speaking, the more complex the family, the greater the potential disparity between the training error and the population error of the classifier. This principle is embodied in layman's terms by Occam's razor principle, which suggests favoring low-complexity hypotheses over complex ones. We study a family of low-complexity classifiers consisting of thresholding the one-dimensional feature obtained by projecting the data on a random line after embedding it into a higher dimensional space parametrized by monomials of order up to k. More specifically, the extended data is projected n-times and the best classifier among those n (based on its performance on training data) is chosen. We obtain a bound on the generalization error of these low-complexity classifiers. The bound is less than that of any classifier with a non-trivial VC dimension, and thus less than that of a linear classifier. We also show that, given full knowledge of the class conditional densities, the error of the classifiers would converge to the optimal (Bayes) error as k and n go to infinity; if only a training dataset is given, we show that the classifiers will perfectly classify all the training points as k and n go to infinity.

翻译：分类器的普遍错误与选择分类器所在的一组功能的复杂性有关。粗略地说, 家庭越复杂, 培训错误与分类器人口错误之间的潜在差异越大。这一原则由Occam的剃刀原则体现在外人术语中, 这表示偏向于低复杂假设而非复杂假设。我们研究的是低复杂分类的组合, 由在随机线上投射数据, 将数据嵌入一个更高维度的空间, 以单向 k。更具体地说, 扩展数据是预测n- 时间, 并且选择了这些n( 根据其培训数据的性能) 中的最佳分类器。我们从这些低复杂分类器的一般错误中获得了约束。我们的研究范围小于任何非初始 VC 尺寸的分类器, 因而比线性分类器的大小要小。我们还表明, 如果完全了解了班级的精确度和精确级级的分类方法, 只有当我们作为最精确级级级级级级级级级的训练者才具备最精确的精确性, 我们的分类方法才能将显示最精确性级级级级级级级级级级级的分类。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

【ICML2021】域自适应回归的子空间距离表示

专知会员服务

22+阅读 · 2021年6月28日

【经典书】计算理论导论，482页pdf

专知会员服务

85+阅读 · 2021年4月10日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日