Multilabel conditional image generation is a challenging problem in computer vision. In this work we propose Multi-ingredient Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing multilabel images. We design MPG based on a state-of-the-art GAN structure called StyleGAN2, in which we develop a new conditioning technique by enforcing intermediate feature maps to learn scalewise label information. Because of the complex nature of the multilabel image generation problem, we also regularize synthetic image by predicting the corresponding ingredients as well as encourage the discriminator to distinguish between matched image and mismatched image. To verify the efficacy of MPG, we test it on Pizza10, which is a carefully annotated multi-ingredient pizza image dataset. MPG can successfully generate photo-realist pizza images with desired ingredients. The framework can be easily extend to other multilabel image generation scenarios.
翻译:多标签的有条件图像生成是计算机视觉中一个具有挑战性的问题。 在这项工作中,我们提议了多版本披萨生成器(MPG),这是一个用于合成多标签图像的有条件生成神经网络(GAN)框架。我们设计了基于最先进的GAN结构的MPG,称为StyleGAN2,我们在这个结构中通过执行中间特征图开发一种新的调节技术,以学习比例化标签信息。由于多标签图像生成问题的复杂性质,我们还通过预测相应成分以及鼓励歧视者区分匹配图像和不匹配图像来规范合成图像。为了验证 MPG的功效,我们在Pizza10上测试它,这是一个谨慎的注解多版本披萨图像数据集。 MPG可以成功生成带有所需成分的摄影现实披萨图像。 这个框架可以很容易地扩展到其他多标签图像生成场景。