Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how both in theory and practice we can break beyond power law scaling and reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this new exponential scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling performance on ResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.
翻译:广泛观察的神经缩放法,其中错误随着培训数据集大小、模型大小或两者的能量而下降,从而推动深层学习的大幅绩效改进。然而,这些改进单靠扩大规模就要求计算和能量方面的大量成本。我们在这里侧重于用数据集大小来缩小错误的大小,并展示在理论和实践上,我们如何超越权力法缩放,将其缩小到指数缩放,如果我们能够获得高质量的数据缩放度,将培训范例弃置以达到任何经调整的数据集大小的顺序排在后面。然后我们用精细数据集大小的经验测试这种新的指数缩放预测,并且确实比在经过CIFAR-10、SVHN和图像网络培训的ResNet上的权力法缩放业绩要好。鉴于找到高质量裁剪裁度度度尺度的重要性,我们在图像网上对十种不同的数据缩放量度进行第一次大规模基准研究,我们发现大多数现有的高性计量尺度比图像网差,而最好的是计算密集度,需要为每张图像贴标签。因此,我们开发了一个新的简单、廉价和可测量的升级的升级的衡量标准,从而展示了我们可进行自我升级的升级的升级的成绩。