Large-scale medical biobanks provide imaging data complemented by extensive tabular information, such as demographics or clinical measurements. However, this abundance of tabular attributes does not reflect real-world datasets, where only a subset of attributes may be available. This discrepancy calls for methods that can leverage all the tabular data during training while remaining robust to missing values at inference. To address this challenge, we propose RoVTL (Robust Vision-Tabular Learning), a framework designed to handle any level of tabular data availability, from 0% to 100%. RoVTL comprises two key stages: contrastive pretraining, where we introduce tabular attribute missingness as data augmentation to promote robustness, and downstream task tuning using a gated cross-attention module for multimodal fusion. During fine-tuning, we employ a novel Tabular More vs. Fewer loss that ranks performance based on the amount of available tabular data. Combined with disentangled gradient learning, this enables consistent performance across all tabular data completeness scenarios. We evaluate RoVTL on cardiac MRI scans from the UK Biobank, demonstrating superior robustness to missing tabular data compared to prior methods. Furthermore, RoVTL successfully generalizes to an external cardiac MRI dataset for multimodal disease classification, and extends to the natural images domain, achieving robust performance on a car advertisements dataset. The code is available at https://github.com/marteczkah/RoVTL.
翻译:大规模医学生物库提供了影像数据,并辅以丰富的表格信息,如人口统计学或临床测量数据。然而,这种表格属性的丰富性并不能反映现实世界的数据集,其中可能只有部分属性可用。这种差异要求方法能够在训练期间利用所有表格数据,同时在推理时对缺失值保持鲁棒性。为应对这一挑战,我们提出了RoVTL(鲁棒视觉-表格学习),这是一个设计用于处理从0%到100%任意表格数据可用性水平的框架。RoVTL包含两个关键阶段:对比预训练阶段,我们引入表格属性缺失作为数据增强以提升鲁棒性;以及下游任务微调阶段,使用门控交叉注意力模块进行多模态融合。在微调过程中,我们采用了一种新颖的“表格数据多与少”损失函数,该函数根据可用表格数据的量级对性能进行排序。结合解耦梯度学习,这使得在所有表格数据完整性场景下都能保持一致的性能。我们在英国生物库的心脏MRI扫描数据上评估了RoVTL,结果表明相较于现有方法,其对缺失表格数据具有更优的鲁棒性。此外,RoVTL成功泛化至外部心脏MRI数据集用于多模态疾病分类,并扩展到自然图像领域,在汽车广告数据集上实现了鲁棒的性能。代码可在https://github.com/marteczkah/RoVTL获取。