We present ARCH++, an image-based method to reconstruct 3D avatars with arbitrary clothing styles. Our reconstructed avatars are animation-ready and highly realistic, in both the visible regions from input views and the unseen regions. While prior work shows great promise of reconstructing animatable clothed humans with various topologies, we observe that there exist fundamental limitations resulting in sub-optimal reconstruction quality. In this paper, we revisit the major steps of image-based avatar reconstruction and address the limitations with ARCH++. First, we introduce an end-to-end point based geometry encoder to better describe the semantics of the underlying 3D human body, in replacement of previous hand-crafted features. Second, in order to address the occupancy ambiguity caused by topological changes of clothed humans in the canonical pose, we propose a co-supervising framework with cross-space consistency to jointly estimate the occupancy in both the posed and canonical spaces. Last, we use image-to-image translation networks to further refine detailed geometry and texture on the reconstructed surface, which improves the fidelity and consistency across arbitrary viewpoints. In the experiments, we demonstrate improvements over the state of the art on both public benchmarks and user studies in reconstruction quality and realism.
翻译:我们用任意的衣着风格展示了ARC++,这是一个基于图像的重建 3D 天体的方法。 我们重建的天体图是动画化的,非常现实的,在从投入角度可见的区域和看不见的区域都是如此。 虽然先前的工作显示极有可能用各种地形来重建有衣着的人,但我们看到,由于重建质量不尽人意,存在着根本性的局限性。 在本文件中,我们重新审视了基于图像的天体重建的主要步骤,并用ARCH++解决了限制。 首先,我们引入了一个基于终端到终端的几何学解码器,以更好地描述3D人类基本身体的语义,以取代以前手工制作的特征。 其次,为了解决有衣着的人的地形变化造成的占用模糊问题,我们建议了一个具有跨空间一致性的共同监督框架,以共同估计成形空间和罐体空间的占用情况。 最后,我们采用了基于图像到图像的翻译网络,以进一步完善详细的几何和文字化的地标码,以更精确的方式描述基本的3D人体结构结构,以取代以前的手工艺特征。 第二,我们用地表上的用户对地表进行了真正的分析,并展示了对地基面的改进了真实性研究。