Underwater acoustic target recognition (UATR) is extremely challenging due to the complexity of ship-radiated noise and the variability of ocean environments. Although deep learning (DL) approaches have achieved promising results, most existing models implicitly assume that underwater acoustic data lie in a Euclidean space. This assumption, however, is unsuitable for the inherently complex topology of underwater acoustic signals, which exhibit non-stationary, non-Gaussian, and nonlinear characteristics. To overcome this limitation, this paper proposes the UATR-GTransformer, a non-Euclidean DL model that integrates Transformer architectures with graph neural networks (GNNs). The model comprises three key components: a Mel patchify block, a GTransformer block, and a classification head. The Mel patchify block partitions the Mel-spectrogram into overlapping patches, while the GTransformer block employs a Transformer Encoder to capture mutual information between split patches to generate Mel-graph embeddings. Subsequently, a GNN enhances these embeddings by modeling local neighborhood relationships, and a feed-forward network (FFN) further performs feature transformation. Experiments results based on two widely used benchmark datasets demonstrate that the UATR-GTransformer achieves performance competitive with state-of-the-art methods. In addition, interpretability analysis reveals that the proposed model effectively extracts rich frequency-domain information, highlighting its potential for applications in ocean engineering.
翻译:水下声学目标识别(UATR)因舰船辐射噪声的复杂性和海洋环境的多变性而极具挑战性。尽管深度学习方法已取得显著成果,但现有模型大多隐含假设水下声学数据位于欧几里得空间。然而,这一假设并不适用于水下声信号固有的复杂拓扑结构,其表现出非平稳、非高斯和非线性特征。为克服此局限,本文提出UATR-GTransformer——一种融合Transformer架构与图神经网络(GNN)的非欧几里得深度学习模型。该模型包含三个核心组件:梅尔分块模块、GTransformer模块和分类头。梅尔分块模块将梅尔谱图分割为重叠的片段,GTransformer模块则采用Transformer编码器捕获片段间的互信息以生成梅尔图嵌入。随后,GNN通过建模局部邻域关系增强这些嵌入表示,前馈神经网络(FFN)进一步执行特征变换。基于两个广泛使用的基准数据集的实验结果表明,UATR-GTransformer取得了与前沿方法相竞争的性能。此外,可解释性分析表明,所提模型能有效提取丰富的频域信息,凸显了其在海洋工程中的应用潜力。