争取实现高效率地说明大型图像分类数据集的良好做法 (Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets)

Data is the engine of modern computer vision, which necessitates collecting large-scale datasets. This is expensive, and guaranteeing the quality of the labels is a major challenge. In this paper, we investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images. While methods that exploit learnt models for labeling exist, a surprisingly prevalent approach is to query humans for a fixed number of labels per datum and aggregate them, which is expensive. Building on prior work on online joint probabilistic modeling of human annotations and machine-generated beliefs, we propose modifications and best practices aimed at minimizing human labeling effort. Specifically, we make use of advances in self-supervised learning, view annotation as a semi-supervised learning problem, identify and mitigate pitfalls and ablate several key design choices to propose effective guidelines for labeling. Our analysis is done in a more realistic simulation that involves querying human labelers, which uncovers issues with evaluation using existing worker simulation methods. Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average, a 2.7x and 6.7x improvement over prior work and manual annotation, respectively. Project page: https://fidler-lab.github.io/efficient-annotation-cookbook

翻译：现代计算机视觉数据是现代计算机视觉的引擎,它需要收集大型数据集。这是昂贵的,保证标签质量是一项重大挑战。在本文中,我们调查收集大量图像收集的多级分类标签的有效批注战略。虽然存在利用所学标签模型的方法,但令人惊讶的普遍做法是询问人类每个数字固定数的标签并汇总这些标签,这是昂贵的。在以前关于在线联合模拟人类说明和机器生成的信念的实验的基础上,我们提出旨在尽量减少人类标签努力的修改和最佳做法。具体地说,我们利用自我监督学习的进步,将批注视为半监督学习问题,查明和减轻陷阱,并推迟提出有效标签指南的若干关键设计选择。我们的分析是在更现实的模拟中进行的,它涉及查询人类标签,用现有工人模拟方法来发现问题。在图像Net100的125k图像子集中模拟实验显示,它可以注释到80%的顶级学习成绩, ASHI / prealx preview passion a produstrual_ supal_ shalfrial a 0.35 0.35 a produstrualformatical.