PAC 部分概念类可学性理论 (A Theory of PAC Learnability of Partial Concept Classes)

We extend the theory of PAC learning in a way which allows to model a rich variety of learning tasks where the data satisfy special properties that ease the learning process. For example, tasks where the distance of the data from the decision boundary is bounded away from zero. The basic and simple idea is to consider partial concepts: these are functions that can be undefined on certain parts of the space. When learning a partial concept, we assume that the source distribution is supported only on points where the partial concept is defined. This way, one can naturally express assumptions on the data such as lying on a lower dimensional surface or margin conditions. In contrast, it is not at all clear that such assumptions can be expressed by the traditional PAC theory. In fact we exhibit easy-to-learn partial concept classes which provably cannot be captured by the traditional PAC theory. This also resolves a question posed by Attias, Kontorovich, and Mansour 2019. We characterize PAC learnability of partial concept classes and reveal an algorithmic landscape which is fundamentally different than the classical one. For example, in the classical PAC model, learning boils down to Empirical Risk Minimization (ERM). In stark contrast, we show that the ERM principle fails in explaining learnability of partial concept classes. In fact, we demonstrate classes that are incredibly easy to learn, but such that any algorithm that learns them must use an hypothesis space with unbounded VC dimension. We also find that the sample compression conjecture fails in this setting. Thus, this theory features problems that cannot be represented nor solved in the traditional way. We view this as evidence that it might provide insights on the nature of learnability in realistic scenarios which the classical theory fails to explain.

翻译：我们扩展了PAC学习的理论, 从而可以模拟大量的各种学习任务, 数据满足了特殊特性, 从而方便学习过程。例如, 任务, 数据与决定边界的距离与零相隔。基本和简单的想法是考虑部分概念: 这些功能在空间的某些部分是无法定义的。当学习一个部分概念时, 我们假设源的分布只在部分概念定义的点上得到支持。这样, 人们自然可以表达对数据的现实化假设, 比如数据位于较低维度表面或边距条件上。相比之下, 很难说传统 PAC 理论可以表达这种理论的失败性。事实上, 我们展示的是简单到阅读部分概念的局部概念类。这还解决了Attia、 Kontorovich 和 Mansour 2019 提出的问题。我们将部分概念类的可学性描述为部分概念级, 并揭示出一种根本上无法解决的算法面。例如, 古典 PAC 模型显示的是, 传统的PAC 理论性或传统的PAC 逻辑性, 无法以传统的PAC adly intal rial rial intal intal intal imactalation imactationalation legalation (我们学习了这种不易理解, leglegalation) legal) legalation legalation lax lex lex lexm