Tuning a pre-trained network is commonly thought to improve data efficiency. However, Kaiming He et al. have called into question the utility of pre-training by showing that training from scratch can often yield similar performance, should the model train long enough. We show that although pre-training may not improve performance on traditional classification metrics, it does provide large benefits to model robustness and uncertainty. Through extensive experiments on label corruption, class imbalance, adversarial examples, out-of-distribution detection, and confidence calibration, we demonstrate large gains from pre-training and complementary effects with task-specific methods. We show approximately a 30% relative improvement in label noise robustness and a 10% absolute improvement in adversarial robustness on CIFAR-10 and CIFAR-100. In some cases, using pre-training without task-specific methods surpasses the state-of-the-art, highlighting the importance of using pre-training when evaluating future methods on robustness and uncertainty tasks.
翻译:培训前的网络通常被认为可以提高数据效率。然而,Kaiming He等人对培训前的效用提出质疑,认为如果培训模式足够长,从零到零的培训往往能产生类似的效果。我们表明,虽然培训前培训可能无法提高传统分类指标的绩效,但培训前培训确实为模型的稳健性和不确定性提供了很大的好处。通过在标签腐败、阶级不平衡、对抗性实例、分配以外的检测和信任校准等方面的广泛实验,我们显示了培训前培训与特定任务方法互补的效果。我们显示,在CIRF-10和CIFAR-100的标签噪声稳健性和对抗性强健性方面有大约30%的相对改进,以及10%的绝对改进。在某些情况下,使用没有具体任务方法的培训前培训超过了最新技术,突出了在评估未来稳健和不确定任务的方法时使用培训前培训的重要性。