We present a systematic study on a new task called dichotomous image segmentation (DIS), which aims to segment highly accurate objects from natural images. To this end, we collected the first large-scale dataset, called DIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) images covering camouflaged, salient, or meticulous objects in various backgrounds. All images are annotated with extremely fine-grained labels. In addition, we introduce a simple intermediate supervision baseline (IS-Net) using both feature-level and mask-level guidance for DIS model training. Without tricks, IS-Net outperforms various cutting-edge baselines on the proposed DIS5K, making it a general self-learned supervision network that can help facilitate future research in DIS. Further, we design a new metric called human correction efforts (HCE) which approximates the number of mouse clicking operations required to correct the false positives and false negatives. HCE is utilized to measure the gap between models and real-world applications and thus can complement existing metrics. Finally, we conduct the largest-scale benchmark, evaluating 16 representative segmentation models, providing a more insightful discussion regarding object complexities, and showing several potential applications (e.g., background removal, art design, 3D reconstruction). Hoping these efforts can open up promising directions for both academic and industries. We will release our DIS5Kdataset, IS-Net baseline, HCE metric, and the complete benchmark results.
翻译:我们对一个名为“二相图像分割”(DIS)的新任务进行了系统的研究,目的是将自然图像中的高度准确对象进行分解。为此目的,我们收集了第一个称为DIS5K的大型数据集,称为DIS5K,其中包括5,470个高分辨率(例如2K、4K或更大的)图像,涵盖不同背景的伪装、突出或细化对象。所有图像都有极细微的标签附加说明。此外,我们采用了一个简单的中间监督基线(IS-Net),用于综合安全信息系统模型培训的特征级别和面具级别指导。为此目的,我们收集了第一个称为DIS5K的大型数据集,称为DIS5K,该数据集收集了各种尖端基线,有助于综合安全信息系统的未来研究。此外,我们设计了一个称为人类校正努力(HCE)的新指标,该指标与纠正假正数和假正反差所需的鼠鼠点击操作数量相近。HCE(IS-Net)用来测量模型与实际应用之间的差距,从而可以补充现有的指标。最后,我们进行了规模最大的CE-Net-Net数据库数据库应用基准,评估了各种目标,评估了16个模型,并展示了基础设计工作。