Remote sensing (RS) image-text retrieval faces significant challenges in real-world datasets due to the presence of Pseudo-Matched Pairs (PMPs), semantically mismatched or weakly aligned image-text pairs, which hinder the learning of reliable cross-modal alignments. To address this issue, we propose a novel retrieval framework that leverages Cross-Modal Gated Attention and a Positive-Negative Awareness Attention mechanism to mitigate the impact of such noisy associations. The gated module dynamically regulates cross-modal information flow, while the awareness mechanism explicitly distinguishes informative (positive) cues from misleading (negative) ones during alignment learning. Extensive experiments on three benchmark RS datasets, i.e., RSICD, RSITMD, and RS5M, demonstrate that our method consistently achieves state-of-the-art performance, highlighting its robustness and effectiveness in handling real-world mismatches and PMPs in RS image-text retrieval tasks.
翻译:遥感(RS)图文检索在实际数据集中面临重大挑战,主要源于伪匹配对(PMPs)的存在——即语义不匹配或弱对齐的图像-文本对,这些伪匹配对阻碍了可靠跨模态对齐的学习。为解决这一问题,我们提出了一种新颖的检索框架,该框架利用跨模态门控注意力与正负感知注意力机制来减轻此类噪声关联的影响。门控模块动态调节跨模态信息流,而感知机制在对其学习过程中明确区分信息性(正)线索与误导性(负)线索。在三个基准遥感数据集(即RSICD、RSITMD和RS5M)上的大量实验表明,我们的方法持续实现了最先进的性能,突显了其在处理遥感图文检索任务中实际不匹配与伪匹配对方面的鲁棒性和有效性。