走向可控图像生成神经图像管道 (Towards a Neural Graphics Pipeline for Controllable Image Generation)

In this paper, we leverage advances in neural networks towards forming a neural rendering for controllable image generation, and thereby bypassing the need for detailed modeling in conventional graphics pipeline. To this end, we present Neural Graphics Pipeline (NGP), a hybrid generative model that brings together neural and traditional image formation models. NGP decomposes the image into a set of interpretable appearance feature maps, uncovering direct control handles for controllable image generation. To form an image, NGP generates coarse 3D models that are fed into neural rendering modules to produce view-specific interpretable 2D maps, which are then composited into the final output image using a traditional image formation model. Our approach offers control over image generation by providing direct handles controlling illumination and camera parameters, in addition to control over shape and appearance variations. The key challenge is to learn these controls through unsupervised training that links generated coarse 3D models with unpaired real images via neural and traditional (e.g., Blinn- Phong) rendering functions, without establishing an explicit correspondence between them. We demonstrate the effectiveness of our approach on controllable image generation of single-object scenes. We evaluate our hybrid modeling framework, compare with neural-only generation methods (namely, DCGAN, LSGAN, WGAN-GP, VON, and SRNs), report improvement in FID scores against real images, and demonstrate that NGP supports direct controls common in traditional forward rendering. Code is available at http://geometry.cs.ucl.ac.uk/projects/2021/ngp.

翻译：在本文中,我们利用神经网络的进步来形成可控图像生成的神经造影,从而绕过常规图形管道中详细建模的需要。为此,我们介绍神经图形管道(NGP),这是一个混合基因模型,将神经和传统图像形成模型结合在一起。NGP将图像分解成一套可解释的外观特征地图,发现可控图像生成的直接控制控控控控器。为了形成图像,NGP生成了粗俗的3D模型,这些模型被输入神经成模,以生成可视化的2D地图,然后用传统图像形成模型模型将这些图集成成成最后产出图像。我们的方法通过提供直接控制光化和摄像参数的处理器来控制图像生成。关键的挑战是如何通过未经校正的培训来了解这些控制,即通过神经和传统(e.g.Blinn-Phong)生成了粗俗的3D模型,这些模型用于生成功能,而没有针对传统的图像形成明确的普通通信。我们的方法通过SHARAN 展示了我们制作的单个模型的方法。