The introduction of negative labels (NLs) has proven effective in enhancing Out-of-Distribution (OOD) detection. However, existing methods often lack an understanding of OOD images, making it difficult to construct an accurate negative space. Furthermore, the absence of negative labels semantically similar to ID labels constrains their capability in near-OOD detection. To address these issues, we propose shaping an Adaptive Negative Textual Space (ANTS) by leveraging the understanding and reasoning capabilities of multimodal large language models (MLLMs). Specifically, we cache images likely to be OOD samples from the historical test images and prompt the MLLM to describe these images, generating expressive negative sentences that precisely characterize the OOD distribution and enhance far-OOD detection. For the near-OOD setting, where OOD samples resemble the in-distribution (ID) subset, we cache the subset of ID classes that are visually similar to historical test images and then leverage MLLM reasoning to generate visually similar negative labels tailored to this subset, effectively reducing false negatives and improving near-OOD detection. To balance these two types of negative textual spaces, we design an adaptive weighted score that enables the method to handle different OOD task settings (near-OOD and far-OOD), making it highly adaptable in open environments. On the ImageNet benchmark, our ANTS significantly reduces the FPR95 by 3.1\%, establishing a new state-of-the-art. Furthermore, our method is training-free and zero-shot, enabling high scalability.
翻译:引入负标签已被证明能有效提升分布外检测性能。然而,现有方法通常缺乏对OOD图像的理解,难以构建准确的负空间。此外,缺乏与分布内标签语义相近的负标签限制了其在近OOD检测中的能力。为解决这些问题,我们提出通过利用多模态大语言模型的理解与推理能力,构建自适应负文本空间。具体而言,我们从历史测试图像中缓存可能为OOD样本的图像,并提示MLLM描述这些图像,生成能精确刻画OOD分布的表达性负向语句,从而增强远OOD检测。针对近OOD场景中OOD样本与分布内子集相似的情况,我们缓存与历史测试图像视觉相似的ID类别子集,进而利用MLLM推理生成针对该子集的视觉相似负标签,有效减少假阴性并提升近OOD检测性能。为平衡这两类负文本空间,我们设计了自适应加权评分机制,使方法能处理不同OOD任务设置(近OOD与远OOD),从而在开放环境中具备高度适应性。在ImageNet基准测试中,我们的ANTS方法将FPR95显著降低了3.1%,创造了新的最优性能。此外,本方法无需训练且支持零样本学习,具有高度可扩展性。