The predictive performance of supervised learning algorithms depends on the quality of labels. In a typical label collection process, multiple annotators provide subjective noisy estimates of the "truth" under the influence of their varying skill-levels and biases. Blindly treating these noisy labels as the ground truth limits the accuracy of learning algorithms in the presence of strong disagreement. This problem is critical for applications in domains such as medical imaging where both the annotation cost and inter-observer variability are high. In this work, we present a method for simultaneously learning the individual annotator model and the underlying true label distribution, using only noisy observations. Each annotator is modeled by a confusion matrix that is jointly estimated along with the classifier predictions. We propose to add a regularization term to the loss function that encourages convergence to the true annotator confusion matrix. We provide a theoretical argument as to how the regularization is essential to our approach both for the case of single annotator and multiple annotators. Despite the simplicity of the idea, experiments on image classification tasks with both simulated and real labels show that our method either outperforms or performs on par with the state-of-the-art methods and is capable of estimating the skills of annotators even with a single label available per image.
翻译:受监督的学习算法的预测性性能取决于标签的质量。 在典型的标签收集过程中,多批注员根据不同技能水平和偏差的影响,对“真实性”进行主观的噪音估计。 盲目对待这些吵闹的标签作为地面真理限制了学习算法的准确性, 存在强烈的分歧。 这个问题对于医学成像等领域的应用至关重要, 医学成像的注释成本和观察者之间的变异性都很高。 在这项工作中, 我们提出一种方法, 用于同时学习个别的批注模型模型和基本的真实标签分布, 仅使用噪音的观察。 每个批注员都用一个混乱的矩阵进行模拟和真实的模型。 我们提议在损失函数中加上一个正规化的术语, 以鼓励与真实的批注者混淆矩阵趋同。 我们从理论上论证, 在单个批注员和多批注者的情况中, 常规化对于我们的方法如何至关重要。 尽管这个想法很简单, 以模拟和真实的标签方式对图像分类任务进行实验, 显示我们的方法要么是超越了现有单个的,, 要么是用单一的估算方法,, 也可以用单一的标签的方法进行。