何恺明,本科就读于清华大学,博士毕业于香港中文大学多媒体实验室。 2011年加入微软亚洲研究院(MSRA)工作,主要研究计算机视觉和深度学习。2016年,加入Facebook AI Research(FAIR)担任研究科学家

VIP内容

本文并没有提出一种新的方法,相反,鉴于最近计算机视觉的进展,我们研究了一个简单、渐进、但必须知道的基线:用于视觉Transformer的自监督学习。尽管标准卷积网络的训练方法已经非常成熟且鲁棒,然而ViT的训练方案仍有待于构建,特别是自监督场景下的训练极具挑战。

在这里,我们从基础出发,对训练自监督ViT的几种基本组件的影响进行了分析调研。我们发现:不稳定性是影响精确下降的最主要问题,它会被表面上好的结果覆盖(容易陷入局部最优)。我们通过实验发现:这些结果确实存在部分失败;当训练变得稳定时,这些结果可以进一步提升。基于MoCoV3以及其他自监督框架,我们从不同角度对ViT进行了测试分析;我们对观察到的积极面、挑战性以及开放问题进行了讨论,期望该工作可以为未来的研究提供有用的数据支撑和经验参考。

成为VIP会员查看完整内容
0
19

最新论文

Tuning a pre-trained network is commonly thought to improve data efficiency. However, Kaiming He et al. have called into question the utility of pre-training by showing that training from scratch can often yield similar performance, should the model train long enough. We show that although pre-training may not improve performance on traditional classification metrics, it does provide large benefits to model robustness and uncertainty. Through extensive experiments on label corruption, class imbalance, adversarial examples, out-of-distribution detection, and confidence calibration, we demonstrate large gains from pre-training and complementary effects with task-specific methods. We show approximately a 30% relative improvement in label noise robustness and a 10% absolute improvement in adversarial robustness on CIFAR-10 and CIFAR-100. In some cases, using pre-training without task-specific methods surpasses the state-of-the-art, highlighting the importance of using pre-training when evaluating future methods on robustness and uncertainty tasks.

0
0
下载
预览
Top