Recent advances in AI-driven image generation have introduced new challenges for verifying the authenticity of digital evidence in forensic investigations. Modern generative models can produce visually consistent forgeries that evade traditional detectors based on pixel or compression artefacts. Most existing approaches also lack an explicit measure of anomaly intensity, which limits their ability to quantify the severity of manipulation. This paper introduces Vision-Attention Anomaly Scoring (VAAS), a novel dual-module framework that integrates global attention-based anomaly estimation using Vision Transformers (ViT) with patch-level self-consistency scoring derived from SegFormer embeddings. The hybrid formulation provides a continuous and interpretable anomaly score that reflects both the location and degree of manipulation. Evaluations on the DF2023 and CASIA v2.0 datasets demonstrate that VAAS achieves competitive F1 and IoU performance, while enhancing visual explainability through attention-guided anomaly maps. The framework bridges quantitative detection with human-understandable reasoning, supporting transparent and reliable image integrity assessment. The source code for all experiments and corresponding materials for reproducing the results are available open source.
翻译:人工智能驱动的图像生成技术的最新进展为法证调查中数字证据的真实性验证带来了新的挑战。现代生成模型能够产生视觉上一致的伪造图像,从而规避基于像素或压缩伪影的传统检测器。大多数现有方法还缺乏对异常强度的显式度量,这限制了其量化篡改严重程度的能力。本文提出了视觉注意力异常评分(VAAS),一种新颖的双模块框架,该框架整合了基于视觉Transformer(ViT)的全局注意力异常估计与源自SegFormer嵌入的块级自一致性评分。这种混合架构提供了一个连续且可解释的异常分数,能够同时反映篡改的位置和程度。在DF2023和CASIA v2.0数据集上的评估表明,VAAS在F1和IoU性能上具有竞争力,同时通过注意力引导的异常图增强了视觉可解释性。该框架将定量检测与人类可理解的推理相结合,支持透明可靠的图像完整性评估。所有实验的源代码及用于复现结果的相应材料均已开源提供。