In recent years, Generative Adversarial Networks have become ubiquitous in both research and public perception, but how GANs convert an unstructured latent code to a high quality output is still an open question. In this work, we investigate regression into the latent space as a probe to understand the compositional properties of GANs. We find that combining the regressor and a pretrained generator provides a strong image prior, allowing us to create composite images from a collage of random image parts at inference time while maintaining global consistency. To compare compositional properties across different generators, we measure the trade-offs between reconstruction of the unrealistic input and image quality of the regenerated samples. We find that the regression approach enables more localized editing of individual image parts compared to direct editing in the latent space, and we conduct experiments to quantify this independence effect. Our method is agnostic to the semantics of edits, and does not require labels or predefined concepts during training. Beyond image composition, our method extends to a number of related applications, such as image inpainting or example-based image editing, which we demonstrate on several GANs and datasets, and because it uses only a single forward pass, it can operate in real-time. Code is available on our project page: https://chail.github.io/latent-composition/.
翻译:近年来,General Adversarial Network在研究和公众认知中变得无处不在,但GANs如何将一个非结构化的潜在代码转换成高质量的输出,仍然是一个尚未解决的问题。在这项工作中,我们调查潜入空间的回归,作为了解GANs构成特性的探测器。我们发现,将回溯器和预先训练的生成器相结合,之前就提供了强烈的图像,使我们能够在推断时从随机图像部分的拼贴中生成合成图像,同时保持全球一致性。为了比较不同生成器的合成属性,我们测量重塑不切实际输入和再生成样本图像质量之间的权衡。我们发现,回归法使得单个图像部分的编辑比在潜在空间的直接编辑更本地化,我们进行实验以量化这一独立效应。我们的方法对编辑的语义是不可知,在培训期间不需要标签或预先定义的概念。除了图像构成外,我们的方法还扩展到一些相关的应用,例如图像的绘制或示例图像编辑。我们发现,回归方法使得单个图像部分的图像部分能够进行更精确的编辑,因为我们在多个GAN/Setimme上使用了一个实时的版本数据,因为它只是单个的版本。