Multi-prompt learning methods have emerged as an effective approach for facilitating the rapid adaptation of vision-language models to downstream tasks with limited resources. Existing multi-prompt learning methods primarily focus on utilizing various meticulously designed prompts within a single foundation vision-language model to achieve superior performance. However, the overlooked model-prompt matching bias hinders the development of multi-prompt learning, i.e., the same prompt can convey different semantics across distinct vision-language models, such as CLIP-ViT-B/16 and CLIP-ViT-B/32, resulting in inconsistent predictions of identical prompt. To mitigate the impact of this bias on downstream tasks, we explore an ensemble learning approach to sufficiently aggregate the benefits of diverse predictions. Additionally, we further disclose the presence of sample-prompt matching bias, which originates from the prompt-irrelevant semantics encapsulated in the input samples. Thus, directly utilizing all information from the input samples for generating weights of ensemble learning can lead to suboptimal performance. In response, we extract prompt-relevant semantics from input samples by leveraging the guidance of the information theory-based analysis, adaptively calculating debiased ensemble weights. Overall, we propose Adaptive-Debiased Ensemble MultiPrompt Learning, abbreviated as AmPLe, to mitigate the two types of bias simultaneously. Extensive experiments on three representative tasks, i.e., generalization to novel classes, new target datasets, and unseen domain shifts, show that AmPLe can widely outperform existing methods. Theoretical validation from a causal perspective further supports the effectiveness of AmPLe.
翻译:多提示学习方法已成为在有限资源下促进视觉-语言模型快速适应下游任务的有效途径。现有方法主要集中于在单一基础视觉-语言模型内利用多种精心设计的提示以实现优越性能。然而,被忽视的模型-提示匹配偏差阻碍了多提示学习的发展,即同一提示在不同视觉-语言模型(如CLIP-ViT-B/16与CLIP-ViT-B/32)中可能传递不同语义,导致对相同提示的预测不一致。为减轻该偏差对下游任务的影响,我们探索采用集成学习方法以充分聚合多样化预测的优势。此外,我们进一步揭示了样本-提示匹配偏差的存在,其源于输入样本中蕴含的与提示无关的语义。因此,直接利用输入样本的全部信息生成集成学习权重可能导致次优性能。为此,我们通过基于信息论的解析指导,从输入样本中提取与提示相关的语义,自适应地计算去偏集成权重。总体而言,我们提出自适应去偏集成多提示学习方法(简称AmPLe),以同时缓解两类偏差。在三个代表性任务(即新类别泛化、新目标数据集适应及未见域偏移)上的大量实验表明,AmPLe能够广泛超越现有方法。基于因果视角的理论验证进一步支持了AmPLe的有效性。