Human matting refers to extracting human parts from natural images with high quality, including human detail information such as hair, glasses, hat, etc. This technology plays an essential role in image synthesis and visual effects in the film industry. When the green screen is not available, the existing human matting methods need the help of additional inputs (such as trimap, background image, etc.), or the model with high computational cost and complex network structure, which brings great difficulties to the application of human matting in practice. To alleviate such problems, most existing methods (such as MODNet) use multi-branches to pave the way for matting through segmentation, but these methods do not make full use of the image features and only utilize the prediction results of the network as guidance information. Therefore, we propose a module to generate foreground probability map and add it to MODNet to obtain Semantic Guided Matting Net (SGM-Net). Under the condition of only one image, we can realize the human matting task. We verify our method on the P3M-10k dataset. Compared with the benchmark, our method has significantly improved in various evaluation indicators.
翻译:人类交配是指从高质量的自然图像中提取人体部分,包括毛发、眼镜、帽子等人类细节信息。这种技术在电影业的图像合成和视觉效果方面发挥着至关重要的作用。当绿色屏幕没有可用时,现有的人类交配方法需要额外的投入(如地形图、背景图像等),或计算成本高和网络结构复杂的模型,这给实际应用人类交配带来了极大的困难。为了缓解这些问题,大多数现有方法(如MODNet)使用多管式方法为通过分割交配铺平道路,但这些方法没有充分利用图像特征,而只是利用网络的预测结果作为指导信息。因此,我们提出了一个模块,生成地面概率图并将其添加到MODNet,以获得精致的引导网(SGM-Net),在只有一个图像的条件下,我们能够实现人类交配任务。我们核查了我们在P3M-10k数据集上的方法。与基准相比,我们的方法大大改进了各种评估指标。