Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. We achieve this by training an adversarial generator to search for images on which the student poorly matches the teacher, and then using them to train the student. Our resulting student closely approximates its teacher for simple datasets like SVHN, and on CIFAR10 we improve on the state-of-the-art for few-shot distillation (with 100 images per class), despite using no data. Finally, we also propose a metric to quantify the degree of belief matching between teacher and student in the vicinity of decision boundaries, and observe a significantly higher match between our zero-shot student and the teacher, than between a student distilled with real data and the teacher. Code available at: https://github.com/polo5/ZeroShotKnowledgeTransfer
翻译:在现代深层学习应用中,从一个大型教师网络向一个较小的学生传授知识是一项受欢迎的任务。然而,由于数据集的大小越来越大,而且隐私条例更加严格,越来越普遍的做法是不能获取用于培训教师的数据。我们建议一种新颖的方法,即培训学生在不使用任何数据或元数据的情况下与其教师的预测相匹配;我们为此培训一个对立的生成器,以搜索学生与教师不匹配的图像,然后用它来培训学生。我们由此形成的学生非常接近其教师,以获取SVHN等简单数据集,在CIFAR10上,尽管没有数据,但我们仍改进了用于少发蒸馏(每班100张图像)的状态。最后,我们还提出了一个指标,以量化教师与学生之间在决策边界附近匹配的信仰程度,并观察到我们的零点学生与教师之间的匹配程度,远高于一个拥有真实数据的学生与教师之间的匹配。代码见:https://github.com/poro5/Zhotkledgefer。