Large, pre-trained generative models have been increasingly popular and useful to both the research and wider communities. Specifically, BigGANs a class-conditional Generative Adversarial Networks trained on ImageNet---achieved excellent, state-of-the-art capability in generating realistic photos. However, fine-tuning or training BigGANs from scratch is practically impossible for most researchers and engineers because (1) GAN training is often unstable and suffering from mode-collapse; and (2) the training requires a significant amount of computation, 256 Google TPUs for 2 days or 8xV100 GPUs for 15 days. Importantly, many pre-trained generative models both in NLP and image domains were found to contain biases that are harmful to society. Thus, we need computationally-feasible methods for modifying and re-purposing these huge, pre-trained models for downstream tasks. In this paper, we propose a cost-effective optimization method for improving and re-purposing BigGANs by fine-tuning only the class-embedding layer. We show the effectiveness of our model-editing approach in three tasks: (1) significantly improving the realism and diversity of samples of complete mode-collapse classes; (2) re-purposing ImageNet BigGANs for generating images for Places365; and (3) de-biasing or improving the sample diversity for selected ImageNet classes.
翻译:对研究和更广泛的社区来说,经过事先训练的大型基因模型越来越受欢迎和有用。具体地说,BigGAN是一个在图像网络上经过培训的等级条件优异的基因反转网络,在制作现实照片方面是优秀的、最先进的。然而,对大多数研究人员和工程师来说,从零开始的微调或培训BigGAN几乎是不可能的,因为:(1) GAN培训往往不稳定,并受到模式折叠的影响;(2) 培训需要大量计算,256 Google TPU, 为期2天或8xV100 GPU, 为期15天。重要的是,在NLP和图像领域,许多经过预先训练的基因模型都含有对社会有害的偏见。因此,我们需要计算可行的方法来修改和重新定位这些庞大的、经过预先训练的模型,用于下游任务。在本文中,我们建议一种具有成本效益的优化方法,改进和重新定位BigGAN, 只需对等级结构进行微调。我们展示了模型编辑方法的效能,在三个网络和图像域域中,为改进图像的模型升级和升级而彻底改进了BARC3 。