Timely and personalized treatment decisions are essential across a wide range of healthcare settings where patient responses can vary significantly and evolve over time. Clinical data used to support these treatment decisions are often irregularly sampled, where missing data frequencies may implicitly convey information about the patient's condition. Existing Reinforcement Learning (RL) based clinical decision support systems often ignore the missing patterns and distort them with coarse discretization and simple imputation. They are also predominantly model-free and largely depend on retrospective data, which could lead to insufficient exploration and bias by historical behaviors. To address these limitations, we propose medDreamer, a novel model-based reinforcement learning framework for personalized treatment recommendation. medDreamer contains a world model with an Adaptive Feature Integration module that simulates latent patient states from irregular data and a two-phase policy trained on a hybrid of real and imagined trajectories. This enables learning optimal policies that go beyond the sub-optimality of historical clinical decisions, while remaining close to real clinical data. We evaluate medDreamer on both sepsis and mechanical ventilation treatment tasks using two large-scale Electronic Health Records (EHRs) datasets. Comprehensive evaluations show that medDreamer significantly outperforms model-free and model-based baselines in both clinical outcomes and off-policy metrics.
翻译:及时且个性化的治疗决策在广泛的医疗场景中至关重要,其中患者反应可能显著变化并随时间演变。用于支持这些治疗决策的临床数据通常是非规则采样的,缺失数据的频率可能隐含地传递患者状况的信息。现有的基于强化学习(RL)的临床决策支持系统往往忽略缺失模式,并通过粗粒度离散化和简单插补扭曲这些模式。这些系统主要基于无模型方法,并严重依赖回顾性数据,这可能导致探索不足并受历史行为偏差影响。为应对这些局限性,我们提出了medDreamer,一种用于个性化治疗推荐的新型基于模型的强化学习框架。medDreamer包含一个具有自适应特征集成模块的世界模型,该模块从非规则数据中模拟潜在患者状态,以及一个在真实与想象轨迹混合数据上训练的两阶段策略。这使得学习最优策略能够超越历史临床决策的次优性,同时保持接近真实临床数据。我们在脓毒症和机械通气治疗任务上使用两个大规模电子健康记录(EHRs)数据集评估medDreamer。综合评估表明,medDreamer在临床结果和离策略指标上均显著优于无模型和基于模型的基线方法。