Accurate organ and lesion segmentation is a critical prerequisite for computer-aided diagnosis. Convolutional Neural Networks (CNNs), constrained by their local receptive fields, often struggle to capture complex global anatomical structures. To tackle this challenge, this paper proposes a novel hybrid architecture, HyM-UNet, designed to synergize the local feature extraction capabilities of CNNs with the efficient global modeling capabilities of Mamba. Specifically, we design a Hierarchical Encoder that utilizes convolutional modules in the shallow stages to preserve high-frequency texture details, while introducing Visual Mamba modules in the deep stages to capture long-range semantic dependencies with linear complexity. To bridge the semantic gap between the encoder and the decoder, we propose a Mamba-Guided Fusion Skip Connection (MGF-Skip). This module leverages deep semantic features as gating signals to dynamically suppress background noise within shallow features, thereby enhancing the perception of ambiguous boundaries. We conduct extensive experiments on public benchmark dataset ISIC 2018. The results demonstrate that HyM-UNet significantly outperforms existing state-of-the-art methods in terms of Dice coefficient and IoU, while maintaining lower parameter counts and inference latency. This validates the effectiveness and robustness of the proposed method in handling medical segmentation tasks characterized by complex shapes and scale variations.
翻译:精确的器官与病灶分割是计算机辅助诊断的关键前提。卷积神经网络(CNNs)受限于其局部感受野,往往难以捕捉复杂的全局解剖结构。为应对这一挑战,本文提出了一种新颖的混合架构HyM-UNet,旨在将CNNs的局部特征提取能力与Mamba的高效全局建模能力相协同。具体而言,我们设计了一种分层编码器,在浅层阶段采用卷积模块以保留高频纹理细节,同时在深层阶段引入视觉Mamba模块,以线性复杂度捕获长程语义依赖。为弥合编码器与解码器之间的语义鸿沟,我们提出了Mamba引导融合跳跃连接(MGF-Skip)。该模块利用深层语义特征作为门控信号,动态抑制浅层特征中的背景噪声,从而增强对模糊边界的感知能力。我们在公开基准数据集ISIC 2018上进行了大量实验。结果表明,HyM-UNet在Dice系数和IoU指标上显著优于现有最先进方法,同时保持了更低的参数量与推理延迟。这验证了所提方法在处理具有复杂形状和尺度变化的医学分割任务时的有效性与鲁棒性。