In this paper, we propose a multi-stage and high-resolution model for image synthesis that uses fine-grained attributes and masks as input. With a fine-grained attribute, the proposed model can detailedly constrain the features of the generated image through rich and fine-grained semantic information in the attribute. With mask as prior, the model in this paper is constrained so that the generated images conform to visual senses, which will reduce the unexpected diversity of samples generated from the generative adversarial network. This paper also proposes a scheme to improve the discriminator of the generative adversarial network by simultaneously discriminating the total image and sub-regions of the image. In addition, we propose a method for optimizing the labeled attribute in datasets, which reduces the manual labeling noise. Extensive quantitative results show that our image synthesis model generates more realistic images.
翻译:在本文中,我们提出了一个多阶段和高分辨率的图像合成模型,该模型使用精细的属性和面罩作为输入。在精细的属性下,拟议模型可以通过属性中的精细和精细的语义信息,详细限制生成图像的特征。使用之前的遮罩,本文中的模型受到限制,使生成的图像符合视觉感知,从而减少基因对抗网络产生的样本出乎意料的多样性。本文还提出了一个计划,通过同时区分图像的总体图像和子区域,改善基因对抗网络的区别性。此外,我们提出了一个优化数据集中标签属性的方法,这减少了人工标注的噪音。广泛的量化结果显示,我们的图像合成模型产生更现实的图像。