Vision foundation models like the Segment Anything Model (SAM), pretrained on large-scale natural image datasets, often struggle in medical image segmentation due to a lack of domain-specific adaptation. In clinical practice, fine-tuning such models efficiently for medical downstream tasks with minimal resource demands, while maintaining strong performance, is challenging. To address these issues, we propose BALR-SAM, a boundary-aware low-rank adaptation framework that enhances SAM for medical imaging. It combines three tailored components: (1) a Complementary Detail Enhancement Network (CDEN) using depthwise separable convolutions and multi-scale fusion to capture boundary-sensitive features essential for accurate segmentation; (2) low-rank adapters integrated into SAM's Vision Transformer blocks to optimize feature representation and attention for medical contexts, while simultaneously significantly reducing the parameter space; and (3) a low-rank tensor attention mechanism in the mask decoder, cutting memory usage by 75% and boosting inference speed. Experiments on standard medical segmentation datasets show that BALR-SAM, without requiring prompts, outperforms several state-of-the-art (SOTA) methods, including fully fine-tuned MedSAM, while updating just 1.8% (11.7M) of its parameters.
翻译:像Segment Anything Model(SAM)这样在大规模自然图像数据集上预训练的视觉基础模型,由于缺乏领域特定适应,在医学图像分割中往往表现不佳。在临床实践中,以最小资源需求高效微调此类模型以适应医学下游任务,同时保持强大性能,是一项挑战。为解决这些问题,我们提出BALR-SAM,一个边界感知的低秩自适应框架,用于增强SAM在医学成像中的应用。它结合了三个定制组件:(1)互补细节增强网络(CDEN),利用深度可分离卷积和多尺度融合来捕获对精确分割至关重要的边界敏感特征;(2)集成到SAM视觉Transformer块中的低秩适配器,以优化医学场景下的特征表示和注意力机制,同时显著减少参数空间;(3)掩码解码器中的低秩张量注意力机制,将内存使用降低75%并提升推理速度。在标准医学分割数据集上的实验表明,BALR-SAM无需提示,性能优于包括完全微调的MedSAM在内的多种最先进方法,同时仅更新其1.8%(1170万)的参数。