In recent years, high-performance computer vision models have achieved remarkable success in medical imaging, with some skin lesion classification systems even surpassing dermatology specialists in diagnostic accuracy. However, such models are computationally intensive and large in size, making them unsuitable for deployment on edge devices. In addition, strict privacy constraints hinder centralized data management, motivating the adoption of Federated Learning (FL). To address these challenges, this study proposes a skewness-guided pruning method that selectively prunes the Multi-Head Self-Attention and Multi-Layer Perceptron layers of a multimodal Swin Transformer based on the statistical skewness of their output distributions. The proposed method was validated in a horizontal FL environment and shown to maintain performance while substantially reducing model complexity. Experiments on the compact Swin Transformer demonstrate approximately 36\% model size reduction with no loss in accuracy. These findings highlight the feasibility of achieving efficient model compression and privacy-preserving distributed learning for multimodal medical AI on edge devices.
翻译:近年来,高性能计算机视觉模型在医学影像领域取得了显著成功,部分皮肤病变分类系统的诊断准确率甚至超越了皮肤病学专家。然而,此类模型计算密集且参数量大,难以部署于边缘设备。此外,严格的隐私约束阻碍了集中式数据管理,这推动了联邦学习(FL)的采用。为应对这些挑战,本研究提出一种偏度引导剪枝方法,该方法基于多模态Swin Transformer中多头自注意力与多层感知器层输出分布的统计偏度进行选择性剪枝。所提方法在横向联邦学习环境中得到验证,结果表明在显著降低模型复杂度的同时能保持性能。在紧凑型Swin Transformer上的实验显示,模型大小减少约36%且准确率无损失。这些发现凸显了在边缘设备上实现多模态医疗人工智能高效模型压缩与隐私保护分布式学习的可行性。