最多100x 快速无数据知识蒸馏 (Up to 100x Faster Data-free Knowledge Distillation)

Data-free knowledge distillation (DFKD) has recently been attracting increasing attention from research communities, attributed to its capability to compress a model only using synthetic data. Despite the encouraging results achieved, state-of-the-art DFKD methods still suffer from the inefficiency of data synthesis, making the data-free training process extremely time-consuming and thus inapplicable for large-scale tasks. In this work, we introduce an efficacious scheme, termed as FastDFKD, that allows us to accelerate DFKD by a factor of orders of magnitude. At the heart of our approach is a novel strategy to reuse the shared common features in training data so as to synthesize different data instances. Unlike prior methods that optimize a set of data independently, we propose to learn a meta-synthesizer that seeks common features as the initialization for the fast data synthesis. As a result, FastDFKD achieves data synthesis within only a few steps, significantly enhancing the efficiency of data-free training. Experiments over CIFAR, NYUv2, and ImageNet demonstrate that the proposed FastDFKD achieves 10$\times$ and even 100$\times$ acceleration while preserving performances on par with state of the art.

翻译：最近,无数据知识蒸馏(DFKD)由于能够只使用合成数据压缩模型,引起了研究界越来越多的关注。尽管取得了令人鼓舞的成果,最先进的DFKD方法仍因数据合成效率低下而受害,使无数据培训过程耗时甚多,因而无法应用于大规模任务。在这项工作中,我们引入了一个称为FastDFKD的有效计划,使我们能够通过数量级因素加速DFKD。我们的方法的核心是重新利用培训数据的共同特点,以综合不同的数据实例。我们建议学习一个元合成器,作为快速数据合成的初始化,以寻求共同特征。结果,快速DFDD只在几步内实现数据合成,大大提高了无数据培训的效率。在CIFAR、NYUv2和图像网的实验表明,拟议的SastDF$DD在保存10美元和100美元时速率的同时,还保持了10美元和100美元时速率。