We study end-to-end learning strategies for 3D shape inference from images, in particular from a single image. Several approaches in this direction have been investigated that explore different shape representations and suitable learning architectures. We focus instead on the underlying probabilistic mechanisms involved and contribute a more principled probabilistic inference-based reconstruction framework, which we coin Probabilistic Reconstruction Networks. This framework expresses image conditioned 3D shape inference through a family of latent variable models, and naturally decouples the choice of shape representations from the inference itself. Moreover, it suggests different options for the image conditioning and allows training in two regimes, using either Monte Carlo or variational approximation of the marginal likelihood. Using our Probabilistic Reconstruction Networks we obtain single image 3D reconstruction results that set a new state of the art on the ShapeNet dataset in terms of the intersection over union and earth mover's distance evaluation metrics. Interestingly, we obtain these results using a basic voxel grid representation, improving over recent work based on finer point cloud or mesh based representations.
翻译:我们研究了3D形状从图像、特别是从单一图像中推断的端到端学习策略。 已经调查了这方面的几种方法,探索了不同的形状表现和适当的学习结构。 我们把重点放在了潜在的概率机制上,并促成了一个更加有原则的概率假设重建框架,我们将这一框架与概率重建网络相共。 这个框架通过一组潜伏变异模型来显示3D形状推断,并自然地将形状表达方式与推断本身区分开来。 此外,它提出了图像调节的不同选择,并允许在两个制度中进行培训,使用蒙特卡洛或边际可能性的变相近度。 我们利用我们的概率重建网络获得了单一的3D图像重建结果,从而确定了ShapeNet数据集的新状态,即结合和地球移动器的距离评价度。 有趣的是,我们利用基本的 voxel 电网代表方式来获得这些结果, 改进了基于精细点云或网状描述的近期工作。