Autonomous driving systems require huge amounts of data to train. Manual annotation of this data is time-consuming and prohibitively expensive since it involves human resources. Therefore, active learning emerged as an alternative to ease this effort and to make data annotation more manageable. In this paper, we introduce a novel active learning approach for object detection in videos by exploiting temporal coherence. Our active learning criterion is based on the estimated number of errors in terms of false positives and false negatives. The detections obtained by the object detector are used to define the nodes of a graph and tracked forward and backward to temporally link the nodes. Minimizing an energy function defined on this graphical model provides estimates of both false positives and false negatives. Additionally, we introduce a synthetic video dataset, called SYNTHIA-AL, specially designed to evaluate active learning for video object detection in road scenes. Finally, we show that our approach outperforms active learning baselines tested on two datasets.
翻译:自动驱动系统需要大量培训数据。 人工批注这些数据耗时费时费钱,因为涉及人力资源,因此,主动学习是缓解这种努力的替代方法,使数据注释更容易管理。 在本文中,我们采用了一种新的主动学习方法,利用时间的一致性在视频中检测物体。我们的积极学习标准以假正数和假负数的估计误差为基础。物体探测器获得的检测用于界定图表节点,并跟踪到前向后至时间链接节点。 最小化该图形模型定义的能量功能提供了假正数和假负数的估计数。 此外,我们引入了一个合成视频数据集,称为SYNTHIA-AL, 专门设计用于评价在路景中检测视频物体的积极学习。 最后,我们显示,我们的方法比在两个数据集上测试的主动学习基线要强。