How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy. In this paper, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this paper is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. For concreteness, we consider in this paper only the realizable case, though analogous results are expected to extend to more general learning scenarios.
翻译:如何从实例中快速地从某类概念中学习? 通过绘制“ 学习曲线 ” 来测量受监督的机器学习算法的性能是常见的。 也就是说, 错误率的衰减是培训实例数量的函数。 但是, 理解学习的经典理论框架, 瓦普尼克- 切尔沃内尼克斯 和 瓦利安特的PAC 模型, 并不能解释学习曲线的行为: 无分配的PAC 学习模式只能将学习曲线的上层封绑于所有可能的数据分布。 这不符合机器学习的做法, 即数据源通常在任何特定的参数中固定, 而学习者可以选择培训实例的数量, 以计算资源和期望的准确性能。 在本论文中, 我们研究的替代学习模型更好地捕捉到机器学习的这些实际方面, 更精确地说, 我们研究的是通用的学习过程的准确性能, 我们的每个案例的准确性能是 。 在每一类中, 我们学习的精确性比例是 。