In this paper we address the benefit of adding adversarial training to the task of monocular depth estimation. A model can be trained in a self-supervised setting on stereo pairs of images, where depth (disparities) are an intermediate result in a right-to-left image reconstruction pipeline. For the quality of the image reconstruction and disparity prediction, a combination of different losses is used, including L1 image reconstruction losses and left-right disparity smoothness. These are local pixel-wise losses, while depth prediction requires global consistency. Therefore, we extend the self-supervised network to become a Generative Adversarial Network (GAN), by including a discriminator which should tell apart reconstructed (fake) images from real images. We evaluate Vanilla GANs, LSGANs and Wasserstein GANs in combination with different pixel-wise reconstruction losses. Based on extensive experimental evaluation, we conclude that adversarial training is beneficial if and only if the reconstruction loss is not too constrained. Even though adversarial training seems promising because it promotes global consistency, non-adversarial training outperforms (or is on par with) any method trained with a GAN when a constrained reconstruction loss is used in combination with batch normalisation. Based on the insights of our experimental evaluation we obtain state-of-the art monocular depth estimation results by using batch normalisation and different output scales.
翻译:在本文中,我们探讨了将对抗性培训与单层深度估计任务相加的好处。一个模型可以在一个自我监督的环境中,在立体图像配对制成的立体图像环境中接受培训,其中深度(差异)是一个中间结果,形成一个右向左图像重建管道。为了图像重建和差异预测的质量,我们使用了不同损失的组合,包括L1图像重建损失和左右偏偏差平滑。这些都是局部像素错误,而深度预测则需要全球一致性。因此,我们扩大自我监督的网络,使其成为一个吉恩式反versarial网络(GAN),包括一个导师,该导师应当从真实图像中分解重建(fake)图像。我们评估Vanilla GANs、LSGANs和Valierstein GANs, 结合不同的像素重建损失。根据广泛的实验性评估,我们得出结论认为,如果重建损失不过分,而且只有在重建损失不过分的情况下,对抗性培训才有好处。即使对抗性培训似乎很有希望,因为它能促进全球一致性,非敌对性反向反向的模拟培训(或正向外演化)从真实图像中分分解。我们用了常规分析的方法,我们使用了正常的升级的升级后,我们使用了常规分析。 与常规的升级的升级,我们使用了标准的升级的升级的升级的压是使用了。