Medical artificial intelligence (AI) systems, particularly multimodal vision-language models (VLM), often exhibit intersectional biases where models are systematically less confident in diagnosing marginalised patient subgroups. Such bias can lead to higher rates of inaccurate and missed diagnoses due to demographically skewed data and divergent distributions of diagnostic certainty. Current fairness interventions frequently fail to address these gaps or compromise overall diagnostic performance to achieve statistical parity among the subgroups. In this study, we developed Cross-Modal Alignment Consistency (CMAC-MMD), a training framework that standardises diagnostic certainty across intersectional patient subgroups. Unlike traditional debiasing methods, this approach equalises the model's decision confidence without requiring sensitive demographic data during clinical inference. We evaluated this approach using 10,015 skin lesion images (HAM10000) with external validation on 12,000 images (BCN20000), and 10,000 fundus images for glaucoma detection (Harvard-FairVLMed), stratifying performance by intersectional age, gender, and race attributes. In the dermatology cohort, the proposed method reduced the overall intersectional missed diagnosis gap (difference in True Positive Rate, $Δ$TPR) from 0.50 to 0.26 while improving the overall Area Under the Curve (AUC) from 0.94 to 0.97 compared to standard training. Similarly, for glaucoma screening, the method reduced $Δ$TPR from 0.41 to 0.31, achieving a better AUC of 0.72 (vs. 0.71 baseline). This establishes a scalable framework for developing high-stakes clinical decision support systems that are both accurate and can perform equitably across diverse patient subgroups, ensuring reliable performance without increasing privacy risks.


翻译:医学人工智能系统,特别是多模态视觉-语言模型,常表现出交叉性偏见,即模型在诊断边缘化患者亚群时系统性置信度较低。此类偏见源于人口统计学偏斜的数据及诊断确定性分布的差异,可能导致误诊和漏诊率升高。现有公平性干预措施往往未能弥合这些差距,或为达成亚群间统计均衡而牺牲整体诊断性能。本研究开发了跨模态对齐一致性训练框架,通过最大均值差异实现交叉患者亚群间诊断确定性的标准化。与传统去偏方法不同,该方案在临床推理过程中无需敏感人口统计数据即可均衡模型的决策置信度。我们使用10,015张皮肤病变图像(HAM10000数据集)进行评估,并在12,000张图像(BCN20000数据集)上进行外部验证,同时采用10,000张青光眼检测眼底图像(Harvard-FairVLMed数据集),按交叉年龄、性别和种族属性分层评估性能。在皮肤病学队列中,相较于标准训练方法,所提方案将整体交叉漏诊率差异(真阳性率差值,$Δ$TPR)从0.50降至0.26,同时将曲线下面积从0.94提升至0.97。在青光眼筛查中,该方法将$Δ$TPR从0.41降至0.31,获得0.72的AUC(基线为0.71)。本研究为开发高风险的临床决策支持系统建立了可扩展框架,在确保准确性的同时能公平适用于不同患者亚群,且不增加隐私风险。

0
下载
关闭预览

相关内容

ACM/IEEE第23届模型驱动工程语言和系统国际会议,是模型驱动软件和系统工程的首要会议系列,由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来,模型涵盖了建模的各个方面,从语言和方法到工具和应用程序。模特的参加者来自不同的背景,包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛,参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会,并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。 官网链接:http://www.modelsconference.org/
Top
微信扫码咨询专知VIP会员