We present DogMo, a large-scale multi-view RGB-D video dataset capturing diverse canine movements for the task of motion recovery from images. DogMo comprises 1.2k motion sequences collected from 10 unique dogs, offering rich variation in both motion and breed. It addresses key limitations of existing dog motion datasets, including the lack of multi-view and real 3D data, as well as limited scale and diversity. Leveraging DogMo, we establish four motion recovery benchmark settings that support systematic evaluation across monocular and multi-view, RGB and RGB-D inputs. To facilitate accurate motion recovery, we further introduce a three-stage, instance-specific optimization pipeline that fits the SMAL model to the motion sequences. Our method progressively refines body shape and pose through coarse alignment, dense correspondence supervision, and temporal regularization. Our dataset and method provide a principled foundation for advancing research in dog motion recovery and open up new directions at the intersection of computer vision, computer graphics, and animal behavior modeling.
翻译:我们提出了DogMo,一个用于从图像中恢复运动任务的大规模多视角RGB-D视频数据集,捕捉了多样化的犬类运动。DogMo包含从10只不同犬只采集的1.2千个运动序列,在运动和品种方面均提供了丰富的多样性。该数据集解决了现有犬类运动数据集的关键局限,包括缺乏多视角和真实三维数据,以及规模和多样性有限。基于DogMo,我们建立了四个运动恢复基准设置,支持在单目与多视角、RGB与RGB-D输入条件下进行系统评估。为促进精确的运动恢复,我们进一步引入了一个三阶段、实例特定的优化流程,将SMAL模型适配到运动序列中。我们的方法通过粗对齐、密集对应监督和时间正则化逐步优化身体形状和姿态。我们的数据集和方法为推进犬类运动恢复研究提供了原则性基础,并在计算机视觉、计算机图形学和动物行为建模的交叉领域开辟了新的研究方向。