Multi-label learning deals with the problem that each instance is associated with multiple labels simultaneously. Most of the existing approaches aim to improve the performance of multi-label learning by exploiting label correlations. Although the data augmentation technique is widely used in many machine learning tasks, it is still unclear whether data augmentation is helpful to multi-label learning. In this paper, (to the best of our knowledge) we provide the first attempt to leverage the data augmentation technique to improve the performance of multi-label learning. Specifically, we first propose a novel data augmentation approach that performs clustering on the real examples and treats the cluster centers as virtual examples, and these virtual examples naturally embody the local label correlations and label importances. Then, motivated by the cluster assumption that examples in the same cluster should have the same label, we propose a novel regularization term to bridge the gap between the real examples and virtual examples, which can promote the local smoothness of the learning function. Extensive experimental results on a number of real-world multi-label data sets clearly demonstrate that our proposed approach outperforms the state-of-the-art counterparts.
翻译:多标签学习涉及每个实例同时与多个标签相关联的问题。 大部分现有方法的目的是通过利用标签关联来改善多标签学习的绩效。 虽然数据增强技术在许多机器学习任务中广泛使用,但仍不清楚数据增强是否有助于多标签学习。 在本文中,(根据我们所知的最好情况)我们首次尝试利用数据增强技术来改进多标签学习的绩效。具体地说,我们首先提出一种新的数据增强方法,对真实实例进行分组,并将集群中心作为虚拟实例对待,这些虚拟例子自然体现了本地标签的关联和标签重要性。 接着,基于组群假设同一组中的例子应该具有相同的标签,我们提出了一个新的正规化术语,以弥合真实实例与虚拟实例之间的差距,这可以促进本地学习功能的顺利性。 一系列真实世界多标签数据的广泛实验结果清楚地表明,我们所提议的方法超越了最先进的对应方。