Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-scale details. Conventional segmentation architectures like U-Net rely on convolution operators that are not rotation-invariant, leading to degraded segmentation accuracy across varying viewpoints. Rotation invariance can be achieved by expanding the filter bank across multiple orientations; however, this will significantly increase computational cost and memory traffic. In this paper, we introduce a GPU-optimized rotation-invariant convolution framework that eliminates the traditional data-lowering (im2col) step required for matrix-multiplication-based convolution. By exploiting structured data sharing among symmetrically rotated filters, our method achieves multi-orientation convolution with greatly reduced memory traffic and computational redundancy. We further generalize the approach to accelerate convolution with arbitrary (non-symmetric) rotation angles. Across extensive benchmarks, the proposed convolution achieves 20--55% faster training and 15--45% lower energy consumption than CUDNN, while maintaining accuracy comparable to state-of-the-art rotation-invariant methods. In the eight-orientation setting, our approach achieves up to 45% speedup and 41% energy savings on 256\(\times\)256 inputs, and 32% speedup and 23% lower energy usage on 1024\(\times\)1024 inputs. Integrated into a U-Net segmentation model, the framework yields up to 6% improvement in accuracy over the non-rotation-aware baseline. These results demonstrate that the proposed method provides an effective and highly efficient alternative to existing rotation-invariant CNN frameworks.
翻译:旋转不变性对于无人机航拍图像中精确的目标级分割至关重要,因为目标可能具有任意朝向并呈现精细细节。传统的分割架构(如U-Net)依赖的卷积算子不具备旋转不变性,导致在不同视角下分割精度下降。通过将滤波器组扩展至多个方向可实现旋转不变性,但这会显著增加计算成本和内存流量。本文提出了一种GPU优化的旋转不变卷积框架,它消除了基于矩阵乘法的卷积所需传统数据降维(im2col)步骤。通过利用对称旋转滤波器间的结构化数据共享,我们的方法以大幅降低的内存流量和计算冗余实现了多方向卷积。我们进一步将该方法推广至加速任意(非对称)旋转角度的卷积。在大量基准测试中,所提出的卷积相比CUDNN实现了20-55%的训练加速和15-45%的能耗降低,同时保持与先进旋转不变方法相当的精度。在八方向设置下,我们的方法在256×256输入上实现最高45%的速度提升和41%的能耗节省,在1024×1024输入上实现32%的速度提升和23%的能耗降低。将该框架集成到U-Net分割模型中,相比非旋转感知基线模型可获得最高6%的精度提升。这些结果表明,所提方法为现有旋转不变CNN框架提供了一种高效且极具效率的替代方案。