Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study how to use a different weight for every unlabeled example. Manual tuning of all those weights -- as done in prior work -- is no longer possible. Instead, we adjust those weights via an algorithm based on the influence function, a measure of a model's dependency on one training example. To make the approach efficient, we propose a fast and effective approximation of the influence function. We demonstrate that this technique outperforms state-of-the-art methods on semi-supervised image and language classification tasks.
翻译:现有的半监督学习算法(SSL)使用单一重量来平衡标签和未标签例子的损失,即所有未标签的例子都是同等加权的。但并非所有未标签的数据都是相等的。在本文中,我们研究如何对每一个未标签的例子使用不同的重量。不再可能像以前的工作那样,对所有这些加权进行手工调整。相反,我们通过一种基于影响函数的算法来调整这些加权,这是衡量模型对一个培训示例的依赖程度的一个尺度。为了使这种方法效率高,我们建议快速有效地近似影响函数。我们证明,这种技术在半监督图像和语言分类任务上超过了最先进的方法。