There is a fundamental limitation in the prediction performance that a machine learning model can achieve due to the inevitable uncertainty of the prediction target. In classification problems, this can be characterized by the Bayes error, which is the best achievable error with any classifier. The Bayes error can be used as a criterion to evaluate classifiers with state-of-the-art performance and can be used to detect test set overfitting. We propose a simple and direct Bayes error estimator, where we just take the mean of the labels that show \emph{uncertainty} of the classes. Our flexible approach enables us to perform Bayes error estimation even for weakly supervised data. In contrast to others, our method is model-free and even instance-free. Moreover, it has no hyperparameters and gives a more accurate estimate of the Bayes error than classifier-based baselines. Experiments using our method suggest that a recently proposed classifier, the Vision Transformer, may have already reached the Bayes error for certain benchmark datasets.
翻译:由于预测目标的必然不确定性,机器学习模型能够实现的预测性能存在一个根本性的限制。 在分类问题中,这可以用贝耶斯错误来描述,这是任何分类者最佳的可实现错误。 贝耶斯错误可以用作评估具有最先进性能的分类者的标准,并且可以用来检测测试设置过度的测试。 我们提出了一个简单而直接的贝耶斯错误估计器, 我们只是采用显示类别中的\emph{ uncertainty} 标签的平均值。 我们的灵活方法使我们能够进行贝耶斯错误估计, 即使是对受到微弱监督的数据也是如此。 与其他方法不同, 我们的方法是没有模型的, 甚至没有实例。 此外, 它没有超参数, 并且比基于分类者的基线更准确地估计了贝耶斯错误。 使用我们的方法进行的实验表明, 最近提出的一个分类者, 愿景变异者, 可能已经在某些基准数据集中达到了贝耶斯错误 。