BANMo: 从多个非正式视频中构建可动画的 3D 神经模型 (BANMo: Building Animatable 3D Neural Models from Many Casual Videos)

Prior work for articulated 3D shape reconstruction often relies on specialized sensors (e.g., synchronized multi-camera systems), or pre-built 3D deformable models (e.g., SMAL or SMPL). Such methods are not able to scale to diverse sets of objects in the wild. We present BANMo, a method that requires neither a specialized sensor nor a pre-defined template shape. BANMo builds high-fidelity, articulated 3D models (including shape and animatable skinning weights) from many monocular casual videos in a differentiable rendering framework. While the use of many videos provides more coverage of camera views and object articulations, they introduce significant challenges in establishing correspondence across scenes with different backgrounds, illumination conditions, etc. Our key insight is to merge three schools of thought; (1) classic deformable shape models that make use of articulated bones and blend skinning, (2) volumetric neural radiance fields (NeRFs) that are amenable to gradient-based optimization, and (3) canonical embeddings that generate correspondences between pixels and an articulated model. We introduce neural blend skinning models that allow for differentiable and invertible articulated deformations. When combined with canonical embeddings, such models allow us to establish dense correspondences across videos that can be self-supervised with cycle consistency. On real and synthetic datasets, BANMo shows higher-fidelity 3D reconstructions than prior works for humans and animals, with the ability to render realistic images from novel viewpoints and poses. Project webpage: banmo-www.github.io .

翻译：先前的关节 3D 形状重建工作通常依赖于专用传感器（例如，同步多摄像机系统）或预先构建的 3D 可变形模型（例如，SMAL 或 SMPL）。这些方法无法扩展到野外多样化的对象集。我们提出了 BANMo，一种方法不需要专用传感器或预定义的模板形状。BANMo 在可微分渲染框架中从许多单眼非正式视频中构建高保真度的关节 3D 模型（包括形状和可动画配重）。虽然使用许多视频提供更多的摄像机视图和物体关节覆盖，但它们在建立跨场景的对应时引入了显著的挑战，例如不同的背景、光照条件等。我们的关键洞察是将三种思想融合起来；(1) 基于关节骨骼和混合调整的经典可变形模型，(2) 可以进行基于梯度的优化的体积神经辐射场（NeRFs），(3) 可以在像素和关节模型之间生成对应关系的规范嵌入。我们介绍了神经混合配重模型，它允许可微分且可逆的关节变形。当与规范嵌入相结合时，这种模型允许我们在视频之间建立密集的对应关系，可以进行循环一致性的自我监督。在真实和合成数据集上，BANMo 显示出比以前的人类和动物更高保真度的 3D 重建，具有从新视角和姿势渲染逼真图像的能力。项目网站：banmo-www.github.io。