火炬蒸馏:知识蒸馏模块化配置驱动框架 (torchdistill: A Modular, Configuration-Driven Framework for Knowledge Distillation)

While knowledge distillation (transfer) has been attracting attentions from the research community, the recent development in the fields has heightened the need for reproducible studies and highly generalized frameworks to lower barriers to such high-quality, reproducible deep learning research. Several researchers voluntarily published frameworks used in their knowledge distillation studies to help other interested researchers reproduce their original work. Such frameworks, however, are usually neither well generalized nor maintained, thus researchers are still required to write a lot of code to refactor/build on the frameworks for introducing new methods, models, datasets and designing experiments. In this paper, we present our developed open-source framework built on PyTorch and dedicated for knowledge distillation studies. The framework is designed to enable users to design experiments by a declarative PyYAML configuration file, and helps researchers complete the recently proposed ML Code Completeness Checklist. Using the developed framework, we demonstrate its various efficient training strategies, and implement a variety of knowledge distillation methods. We also reproduce some of their original experimental results on the ImageNet and COCO datasets presented at major machine learning conferences such as ICLR, NeurIPS, CVPR and ECCV, including recent state-of-the-art methods. All the source code, configurations, log files and the trained model weights are publicly available at https://github.com/yoshitomo-matsubara/torchdistill .

翻译：虽然知识蒸馏(转移)一直引起研究界的注意,但最近这些领域的发展使得更有必要进行可复制的研究和高度普遍化的框架,以降低这种高质量、可复制的深层学习研究的障碍。一些研究人员自愿公布了用于知识蒸馏研究的框架,以帮助其他感兴趣的研究人员复制其原始工作。但是,这种框架通常并不十分普及,也没有得到维护,因此研究人员仍需要写许多代码,以重新构筑/建立采用新方法、模型、数据集和设计实验的框架。我们在本文件中介绍了我们在PyTorch上建立起来的、专门用于知识蒸馏研究的开放源框架。这一框架旨在让用户能够设计通过宣讲性的PyYAML配置文件进行的实验,并帮助研究人员完成最近提议的ML代码完整性校验列表。我们利用开发的框架展示其各种有效的培训战略,并采用各种知识蒸馏方法。我们还转载了在主要机器学习会议上展示的图像网络和COCO数据集的原始实验结果,包括ICmat-Rash、NeurIP所有经过培训的系统/系统结构。