开放世界半监督学习 (Open-World Semi-Supervised Learning)

Supervised and semi-supervised learning methods have been traditionally designed for the closed-world setting based on the assumption that unlabeled test data contains only classes previously encountered in the labeled training data. However, the real world is inherently open and dynamic, and thus novel, previously unseen classes may appear in the test data or during the model deployment. Here, we introduce a new open-world semi-supervised learning setting in which the model is required to recognize previously seen classes, as well as to discover novel classes never seen in the labeled dataset. To tackle the problem, we propose ORCA, an approach that learns to simultaneously classify and cluster the data. ORCA classifies examples from the unlabeled dataset to previously seen classes, or forms a novel class by grouping similar examples together. The key idea in ORCA is in introducing uncertainty based adaptive margin that effectively circumvents the bias caused by the imbalance of variance between seen and novel classes/clusters. We demonstrate that ORCA accurately discovers novel classes and assigns samples to previously seen classes on benchmark image classification datasets, including CIFAR and ImageNet. Remarkably, despite solving the harder task ORCA outperforms semi-supervised methods on seen classes, as well as novel class discovery methods on novel classes, achieving 7% and 151% improvements on seen and novel classes in the ImageNet dataset.

翻译：在封闭世界环境下,传统上设计了监督和半监督的学习方法,其依据是假设未贴标签的测试数据只包含先前在标签培训数据中遇到的类别。然而,真实的世界本质上是开放和动态的,因此在测试数据中或模型部署期间可能出现新颖的、以前看不见的类别。在这里,我们引入了一个新的开放世界半监督的学习环境,要求模型在其中识别以前看到的类别,并发现标签数据集中从未见过的新类。为了解决这个问题,我们建议ORCA,这是一种学习同时分类和分组数据的方法。ORCA将未贴标签的数据集中的例子分类为先前看到的类别,或者通过将类似的例子分组形成一个新的类别。ORCA的主要想法是引入基于不确定性的适应性差幅,从而有效绕过由所见的类别/组之间差异造成的偏差。我们证明,ORCA准确发现新类,并指派样本到以前在基准图像分类数据集(包括CIFAR和图像Net)上看到的类类。ORCA将未贴标签的数据集归类为前一类,在SUI 类中实现了更难的类中,在 SABI SA SA 和新类中,在 SA SA SA SA SA 新的类中作为新的类中取得了更难的类中, SA 。