Clinical decision-making routinely demands reasoning over heterogeneous data, yet existing multimodal language models (MLLMs) remain largely vision-centric and fail to generalize across clinical specialties. To bridge this gap, we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement-learning objective that hierarchically scales normalized rewards according to domain rarity and modality difficulty, mitigating performance imbalance caused by skewed clinical data distributions. Trained on 2.61 million instruction tuning pairs spanning 9 clinical domains, we show that DRPO training boosts diagnostic performance by 43% in macro-F1 on average across all visual domains as compared to other critic-free training methods like GRPO. Furthermore, with QoQ-Med trained on intensive segmentation data, it is able to highlight salient regions related to the diagnosis, with an IoU 10x higher than open models while reaching the performance of OpenAI o4-mini. To foster reproducibility and downstream research, we release (i) the full model weights, (ii) the modular training pipeline, and (iii) all intermediate reasoning traces at https://github.com/DDVD233/QoQ_Med.
翻译:临床决策通常需要对异构数据进行推理,然而现有的多模态语言模型(MLLMs)在很大程度上仍以视觉为中心,且无法泛化到不同的临床专科领域。为弥合这一差距,我们提出了QoQ-Med-7B/32B,这是首个开放的通用临床基础模型,能够联合推理医学图像、时间序列信号和文本报告。QoQ-Med采用领域感知相对策略优化(DRPO)进行训练,这是一种新颖的强化学习目标,它根据领域稀有性和模态难度分层缩放归一化奖励,从而缓解由临床数据分布偏斜导致的性能不平衡问题。该模型在跨越9个临床领域的261万个指令调优对上进行训练,结果显示,与其他无评论者训练方法(如GRPO)相比,DRPO训练将模型在所有视觉领域的平均宏观F1诊断性能提升了43%。此外,通过在密集分割数据上训练的QoQ-Med,能够高亮显示与诊断相关的显著区域,其交并比(IoU)比开源模型高出10倍,同时达到了OpenAI o4-mini的性能水平。为促进可重复性和下游研究,我们在https://github.com/DDVD233/QoQ_Med 上发布了(i)完整的模型权重,(ii)模块化训练流程,以及(iii)所有中间推理轨迹。