In this work, we propose an open-world object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the model for the object detection task using known class annotations. We show that a simple language model fits better than a large contextualized language model for detecting novel objects. Moreover, we introduce a consistency-regularization technique to better exploit image-caption pair information. Our method compares favorably to existing open-world detection approaches while being data-efficient.
翻译:在这项工作中,我们建议一种开放世界天体探测方法,该方法以图像描述配对为基础,学会探测新天体类以及一组已知的类别。这是一种两阶段培训方法,首先使用定位引导图像描述匹配技术,以薄弱的监管方式学习新天体和已知天体类类类的类标签,其次是使用已知的类说明专门设计物体检测任务模型。我们显示,简单语言模型比大型背景化语言模型更适合探测新天体。此外,我们引入了一致性常规化技术,以更好地利用图像描述配对信息。我们的方法比现有的开放世界检测方法更适合数据效率高的方法。