Existing detectors are often trained on biased datasets, leading to the possibility of overfitting on non-causal image attributes that are spuriously correlated with real/synthetic labels. While these biased features enhance performance on the training data, they result in substantial performance degradation when applied to unbiased datasets. One common solution is to perform dataset alignment through generative reconstruction, matching the semantic content between real and synthetic images. However, we revisit this approach and show that pixel-level alignment alone is insufficient. The reconstructed images still suffer from frequency-level misalignment, which can perpetuate spurious correlations. To illustrate, we observe that reconstruction models tend to restore the high-frequency details lost in real images (possibly due to JPEG compression), inadvertently creating a frequency-level misalignment, where synthetic images appear to have richer high-frequency content than real ones. This misalignment leads to models associating high-frequency features with synthetic labels, further reinforcing biased cues. To resolve this, we propose Dual Data Alignment (DDA), which aligns both the pixel and frequency domains. Moreover, we introduce two new test sets: DDA-COCO, containing DDA-aligned synthetic images for testing detector performance on the most aligned dataset, and EvalGEN, featuring the latest generative models for assessing detectors under new generative architectures such as visual auto-regressive generators. Finally, our extensive evaluations demonstrate that a detector trained exclusively on DDA-aligned MSCOCO could improve across 8 diverse benchmarks by a non-trivial margin, showing a +7.2% on in-the-wild benchmarks, highlighting the improved generalizability of unbiased detectors. Our code is available at: https://github.com/roy-ch/Dual-Data-Alignment.
翻译:现有检测器通常在带有偏见的训练集上进行训练,导致可能过度拟合与真实/合成标签虚假相关的非因果图像属性。尽管这些有偏特征提升了在训练数据上的性能,但当应用于无偏数据集时,却会导致显著的性能下降。一种常见解决方案是通过生成式重建进行数据集对齐,使真实图像与合成图像在语义内容上匹配。然而,我们重新审视这一方法并发现,仅进行像素级对齐是不够的。重建后的图像仍存在频域层面的错位,这可能延续虚假相关性。例如,我们观察到重建模型倾向于恢复真实图像中丢失的高频细节(可能源于JPEG压缩),无意中造成了频域错位,使得合成图像似乎比真实图像拥有更丰富的高频内容。这种错位导致模型将高频特征与合成标签相关联,进一步强化了有偏线索。为解决此问题,我们提出了双数据对齐(DDA),该方法同时对像素域和频域进行对齐。此外,我们引入了两个新的测试集:DDA-COCO,包含经过DDA对齐的合成图像,用于评估检测器在最对齐数据集上的性能;以及EvalGEN,包含最新的生成模型,用于评估检测器在新型生成架构(如视觉自回归生成器)下的表现。最后,我们的大量实验表明,仅使用DDA对齐的MSCOCO数据集训练的检测器,在8个多样化基准测试中均取得了显著提升,在野外基准测试中实现了+7.2%的性能增益,突显了无偏检测器泛化能力的提升。我们的代码公开于:https://github.com/roy-ch/Dual-Data-Alignment。