The problem of open-set noisy labels denotes that part of training data have a different label space that does not contain the true class. Lots of approaches, e.g., loss correction and label correction, cannot handle such open-set noisy labels well, since they need training data and test data to share the same label space, which does not hold for learning with open-set noisy labels. The state-of-the-art methods thus employ the sample selection approach to handle open-set noisy labels, which tries to select clean data from noisy data for network parameters updates. The discarded data are seen to be mislabeled and do not participate in training. Such an approach is intuitive and reasonable at first glance. However, a natural question could be raised "can such data only be discarded during training?". In this paper, we show that the answer is no. Specifically, we discuss that the instances of discarded data could consist of some meaningful information for generalization. For this reason, we do not abandon such data, but use instance correction to modify the instances of the discarded data, which makes the predictions for the discarded data consistent with given labels. Instance correction are performed by targeted adversarial attacks. The corrected data are then exploited for training to help generalization. In addition to the analytical results, a series of empirical evidences are provided to justify our claims.
翻译:开放的吵闹标签问题意味着部分培训数据具有不同标签空间,并不包含真实的标签。 许多方法,例如损失校正和标签校正等,无法很好地处理这种开放的吵闹标签,因为它们需要培训数据和测试数据,以共享相同的标签空间,而这种空间不能用开放的吵闹标签来学习。 因此,最先进的方法采用抽样选择方法来处理开放的吵闹标签,它试图从噪音数据中选择清洁数据,用于网络参数更新。 废弃数据被看成是错误标签,不参加培训。 这种方法是直观的,初看起来是合理的。 但是,自然的问题可以提出“ 只有在培训期间才能丢弃这些数据” 。 在本文中,我们表明答案是否定的。 具体地说, 被废弃数据的例子可以包含一些有意义的信息,用于概括化。 为此,我们不放弃这些数据,但使用实例校正来修改被废弃的数据实例,使得被废弃数据预测与被废弃的数据相一致。 被修正后, 被应用到一般的校正数据被校正, 通过一般的校正来进行为被利用的校正。