In this paper, we consider the challenging task of simultaneously locating and recovering multiple hands from single 2D image. Previous studies either focus on single hand reconstruction or solve this problem in a multi-stage way. Moreover, the conventional two-stage pipeline firstly detects hand areas, and then estimates 3D hand pose from each cropped patch. To reduce the computational redundancy in preprocessing and feature extraction, we propose a concise but efficient single-stage pipeline. Specifically, we design a multi-head auto-encoder structure for multi-hand reconstruction, where each head network shares the same feature map and outputs the hand center, pose and texture, respectively. Besides, we adopt a weakly-supervised scheme to alleviate the burden of expensive 3D real-world data annotations. To this end, we propose a series of losses optimized by a stage-wise training scheme, where a multi-hand dataset with 2D annotations is generated based on the publicly available single hand datasets. In order to further improve the accuracy of the weakly supervised model, we adopt several feature consistency constraints in both single and multiple hand settings. Specifically, the keypoints of each hand estimated from local features should be consistent with the re-projected points predicted from global features. Extensive experiments on public benchmarks including FreiHAND, HO3D, InterHand2.6M and RHD demonstrate that our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners. The code and models are available at {\url{https://github.com/zijinxuxu/SMHR}}.
翻译:在本文中, 我们考虑同时从单一 2D 图像中找到多个手并恢复多个手的艰巨任务。 以前的研究要么关注单手重建, 要么以多阶段的方式解决这个问题。 此外, 常规的两阶段管道首先检测手部区域, 然后估计每个裁剪的补丁点的3D 手部。 为了减少预处理前和特征提取中的计算冗余, 我们建议一个简单但有效的单阶段管道。 具体地说, 我们设计一个多头自动编码结构, 用于多手重建, 在每个主网络中分别共享手中央、 外观和纹质的同一特征图和输出。 此外, 我们采用了一个薄弱的超前计划, 以减轻昂贵的 3D 真实世界数据说明的负担。 为此, 我们提出了一系列通过分阶段培训计划优化的损失。 在那里, 一个带有 2D 说明的多手数据集以公开提供的单手数据集为基础。 为了进一步提高薄弱的监控模型的准确性, 我们采用了几个单手环境中和多手环境的特征、 。 具体地说, 每个超前级代码 的 RHAL 3 和内部 的模型中, 的模型应该用一个预测的方法 。