Deep learning greatly improved the realism of animatable human models by learning geometry and appearance from collections of 3D scans, template meshes, and multi-view imagery. High-resolution models enable photo-realistic avatars but at the cost of requiring studio settings not available to end users. Our goal is to create avatars directly from raw images without relying on expensive studio setups and surface tracking. While a few such approaches exist, those have limited generalization capabilities and are prone to learning spurious (chance) correlations between irrelevant body parts, resulting in implausible deformations and missing body parts on unseen poses. We introduce a three-stage method that induces two inductive biases to better disentangled pose-dependent deformation. First, we model correlations of body parts explicitly with a graph neural network. Second, to further reduce the effect of chance correlations, we introduce localized per-bone features that use a factorized volumetric representation and a new aggregation function. We demonstrate that our model produces realistic body shapes under challenging unseen poses and shows high-quality image synthesis. Our proposed representation strikes a better trade-off between model capacity, expressiveness, and robustness than competing methods. Project website: https://lemonatsu.github.io/danbo.
翻译:深度学习通过学习3D扫描、模板模贝和多视图图像集集的几何和外观,大大提高了人造模型的现实性。 高分辨率模型可以使摄影现实化的动因成形, 但代价是无法向终端用户提供演播室设置。 我们的目标是直接从原始图像中产生动因,而不必依赖昂贵的演播室设置和表面跟踪。 虽然有少数这样的方法存在,但这些方法具有有限的概括性能力,并且容易学习不相关的身体部分之间的虚假(感官)关系,从而导致无法令人相信的畸形和隐形上缺失的身体部分。 我们引入了三阶段方法, 引导两种隐含的偏向性偏向, 以更好地分解成形变形。 首先, 我们的目标是用图形神经网络来建立人体部分的模型。 其次, 我们引入了局部的单胞特征, 使用一个要素化的体积代表和一个新的聚合功能。 我们展示了我们的模型在挑战性隐形和高品质图像合成下产生现实的形形形形形形形体形。 我们提议的演示了一种更强型的模型, 亚美的模型。 。 亚基 实验 模型: 一个更强的模型, 模型, 更强的模型, 更强的模型, 更强的模型, 更强的模型, 更强的模型。