Open-vocabulary object detection aims to provide object detectors trained on a fixed set of object categories with the generalizability to detect objects described by arbitrary text queries. Previous methods adopt knowledge distillation to extract knowledge from Pretrained Vision-and-Language Models (PVLMs) and transfer it to detectors. However, due to the non-adaptive proposal cropping and single-level feature mimicking processes, they suffer from information destruction during knowledge extraction and inefficient knowledge transfer. To remedy these limitations, we propose an Object-Aware Distillation Pyramid (OADP) framework, including an Object-Aware Knowledge Extraction (OAKE) module and a Distillation Pyramid (DP) mechanism. When extracting object knowledge from PVLMs, the former adaptively transforms object proposals and adopts object-aware mask attention to obtain precise and complete knowledge of objects. The latter introduces global and block distillation for more comprehensive knowledge transfer to compensate for the missing relation information in object distillation. Extensive experiments show that our method achieves significant improvement compared to current methods. Especially on the MS-COCO dataset, our OADP framework reaches $35.6$ mAP$^{\text{N}}_{50}$, surpassing the current state-of-the-art method by $3.3$ mAP$^{\text{N}}_{50}$. Code is released at https://github.com/LutingWang/OADP.
翻译:开放弹道物体探测的目的是为一组固定物体类别提供经过培训的物体探测器,这种探测器具有一般性,可探测任意文字查询所描述的物体。以前的方法是采用知识蒸馏法,从预先训练的视觉和语言模型(PVLMs)中提取知识并将其转移到探测器。然而,由于非适应性的建议裁剪和单一级特征模拟过程,它们遭受知识提取过程中的信息破坏和低效率知识转让。为了纠正这些限制,我们提议了一个目标-Aware蒸馏器框架,包括一个目标-Aware-知识提取(OAakes)模块和一个蒸馏器(DP)机制。在从PVLMs提取目标知识时,前一种适应性地变换物体提议,并采用对象-觉悟性掩码,以获得对物体的准确和完整知识。后者为更全面的知识转移提供了全球和区块蒸馏法,以补偿在目标蒸馏过程中缺失的关联性资料。广泛的实验表明,我们的方法比当前的方法有了显著的改进。特别是在MS-O-O-O-A-3-RO-MA-RO-RO-RO-RO-RO-MATFD AS-MAT AS-MA-MA-MA-MA-MA-MA-MAT-MA-MAT-MAT-MAT-MAT-MAT-MAT-MAT-MA-MA-MA-MA-MAT-MAT-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MAT-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-MA-</s>