In industrial manufacturing, deploying deep learning models for visual inspection is mostly hindered by the high and often intractable cost of collecting and annotating large-scale training datasets. While image synthesis from 3D CAD models is a common solution, the individual techniques of domain and rendering randomization to create rich synthetic training datasets have been well studied mainly in simple domains. Hence, their effectiveness on complex industrial tasks with densely arranged and similar objects remains unclear. In this paper, we investigate the sim-to-real generalization performance of standard object detectors on the complex industrial application of terminal strip object detection, carefully combining randomization and domain knowledge. We describe step-by-step the creation of our image synthesis pipeline that achieves high realism with minimal implementation effort and explain how this approach could be transferred to other industrial settings. Moreover, we created a dataset comprising 30.000 synthetic images and 300 manually annotated real images of terminal strips, which is publicly available for reference and future research. To provide a baseline as a lower bound of the expectable performance in these challenging industrial parts detection tasks, we show the sim-to-real generalization performance of standard object detectors on our dataset based on a fully synthetic training. While all considered models behave similarly, the transformer-based DINO model achieves the best score with 98.40 % mean average precision on the real test set, demonstrating that our pipeline enables high quality detections in complex industrial environments from existing CAD data and with a manageable image synthesis effort.
翻译:在工业制造中,部署深度学习模型进行视觉检测主要受限于收集和标注大规模训练数据集的高昂且往往难以承受的成本。虽然基于三维CAD模型的图像合成是一种常见解决方案,但用于创建丰富合成训练数据的领域随机化与渲染随机化等具体技术主要在简单领域得到充分研究。因此,这些技术在具有密集排列且相似物体的复杂工业任务中的有效性仍不明确。本文研究了标准目标检测器在端子排目标检测这一复杂工业应用中的仿真到真实泛化性能,并审慎结合了随机化技术与领域知识。我们逐步阐述了所构建的图像合成流程的实现过程,该流程能以最小实现成本达成高真实感,并解释了该方法如何迁移至其他工业场景。此外,我们创建了包含30,000张合成图像与300张人工标注的端子排真实图像的数据集,该数据集已公开供参考与未来研究使用。为在这些具有挑战性的工业零件检测任务中提供可预期性能的下限基准,我们展示了基于全合成训练的标准目标检测器在数据集上的仿真到真实泛化性能。虽然所有考察模型表现相似,但基于Transformer的DINO模型在真实测试集上以98.40%的平均精度均值获得最佳分数,证明我们的流程能够利用现有CAD数据并通过可控的图像合成工作量,在复杂工业环境中实现高质量检测。