Recent advancements in multimodal out-of-context (OOC) misinformation detection have made remarkable progress in checking the consistencies between different modalities for supporting or refuting image-text pairs. However, existing OOC misinformation detection methods tend to emphasize the role of internal consistency, ignoring the significant of external consistency between image-text pairs and external evidence. In this paper, we propose HiEAG, a novel Hierarchical Evidence-Augmented Generation framework to refine external consistency checking through leveraging the extensive knowledge of multimodal large language models (MLLMs). Our approach decomposes external consistency checking into a comprehensive engine pipeline, which integrates reranking and rewriting, apart from retrieval. Evidence reranking module utilizes Automatic Evidence Selection Prompting (AESP) that acquires the relevant evidence item from the products of evidence retrieval. Subsequently, evidence rewriting module leverages Automatic Evidence Generation Prompting (AEGP) to improve task adaptation on MLLM-based OOC misinformation detectors. Furthermore, our approach enables explanation for judgment, and achieves impressive performance with instruction tuning. Experimental results on different benchmark datasets demonstrate that our proposed HiEAG surpasses previous state-of-the-art (SOTA) methods in the accuracy over all samples.
翻译:近年来,多模态上下文外虚假信息检测领域取得了显著进展,通过检验不同模态间的一致性来验证图像-文本对的支持或反驳关系。然而,现有的上下文外虚假信息检测方法往往侧重于内部一致性的作用,忽略了图像-文本对与外部证据之间外部一致性的重要性。本文提出HiEAG,一种新颖的层次化证据增强生成框架,通过利用多模态大语言模型的广泛知识来优化外部一致性检验。我们的方法将外部一致性检验分解为一个综合的引擎流程,该流程除了检索外,还整合了重排序和重写模块。证据重排序模块采用自动证据选择提示技术,从证据检索结果中获取相关证据项。随后,证据重写模块利用自动证据生成提示技术,提升基于多模态大语言模型的上下文外虚假信息检测器的任务适应性。此外,我们的方法支持判断解释,并通过指令微调实现了优异的性能。在不同基准数据集上的实验结果表明,所提出的HiEAG方法在整体样本准确率上超越了以往最先进的方法。