Moving infrared small target detection (IRSTD) plays a critical role in practical applications, such as surveillance of unmanned aerial vehicles (UAVs) and UAV-based search system. Moving IRSTD still remains highly challenging due to weak target features and complex background interference. Accurate spatio-temporal feature modeling is crucial for moving target detection, typically achieved through either temporal differences or spatio-temporal (3D) convolutions. Temporal difference can explicitly leverage motion cues but exhibits limited capability in extracting spatial features, whereas 3D convolution effectively represents spatio-temporal features yet lacks explicit awareness of motion dynamics along the temporal dimension. In this paper, we propose a novel moving IRSTD network (TDCNet), which effectively extracts and enhances spatio-temporal features for accurate target detection. Specifically, we introduce a novel temporal difference convolution (TDC) re-parameterization module that comprises three parallel TDC blocks designed to capture contextual dependencies across different temporal ranges. Each TDC block fuses temporal difference and 3D convolution into a unified spatio-temporal convolution representation. This re-parameterized module can effectively capture multi-scale motion contextual features while suppressing pseudo-motion clutter in complex backgrounds, significantly improving detection performance. Moreover, we propose a TDC-guided spatio-temporal attention mechanism that performs cross-attention between the spatio-temporal features from the TDC-based backbone and a parallel 3D backbone. This mechanism models their global semantic dependencies to refine the current frame's features. Extensive experiments on IRSTD-UAV and public infrared datasets demonstrate that our TDCNet achieves state-of-the-art detection performance in moving target detection.
翻译:移动红外小目标检测(IRSTD)在无人机监控及基于无人机的搜索系统等实际应用中具有关键作用。由于目标特征微弱及背景干扰复杂,移动IRSTD仍面临严峻挑战。精确的时空特征建模对移动目标检测至关重要,通常通过时序差分或时空(3D)卷积实现。时序差分能显式利用运动线索,但提取空间特征的能力有限;而3D卷积虽能有效表征时空特征,却缺乏沿时间维度的显式运动动态感知。本文提出一种新颖的移动IRSTD网络(TDCNet),能有效提取并增强时空特征以实现精准目标检测。具体而言,我们引入一种创新的时序差分卷积重参数化模块,该模块包含三个并行设计的TDC块,用于捕获不同时间跨度的上下文依赖关系。每个TDC块将时序差分与3D卷积融合为统一的时空卷积表示。这种重参数化模块能有效捕获多尺度运动上下文特征,同时抑制复杂背景中的伪运动杂波,显著提升检测性能。此外,我们提出一种TDC引导的时空注意力机制,在基于TDC的主干网络与并行3D主干网络提取的时空特征间执行交叉注意力。该机制通过建模全局语义依赖关系来优化当前帧的特征。在IRSTD-UAV及公开红外数据集上的大量实验表明,我们的TDCNet在移动目标检测中实现了最先进的检测性能。