Understanding the interactions between biomarkers among brain regions during neurodegenerative disease is essential for unravelling the mechanisms underlying disease progression. For example, pathophysiological models of Alzheimer's Disease (AD) typically describe how variables, such as regional levels of toxic proteins, interact spatiotemporally within a dynamical system driven by an underlying biological substrate, often based on brain connectivity. However, current methods grossly oversimplify the complex relationship between brain connectivity by assuming a single-modality brain connectome as the disease-spreading substrate. This leads to inaccurate predictions of pathology spread, especially during the long-term progression period. Meanhwile, other methods of learning such a graph in a purely data-driven way face the identifiability issue due to lack of proper constraint. We thus present a novel framework that uses Large Language Models (LLMs) as expert guides on the interaction of regional variables to enhance learning of disease progression from irregularly sampled longitudinal patient data. By leveraging LLMs' ability to synthesize multi-modal relationships and incorporate diverse disease-driving mechanisms, our method simultaneously optimizes 1) the construction of long-term disease trajectories from individual-level observations and 2) the biologically-constrained graph structure that captures interactions among brain regions with better identifiability. We demonstrate the new approach by estimating the pathology propagation using tau-PET imaging data from an Alzheimer's disease cohort. The new framework demonstrates superior prediction accuracy and interpretability compared to traditional approaches while revealing additional disease-driving factors beyond conventional connectivity measures.
翻译:理解神经退行性疾病过程中脑区之间生物标志物的相互作用对于揭示疾病进展的机制至关重要。例如,阿尔茨海默病(AD)的病理生理学模型通常描述变量(如毒性蛋白的区域水平)如何在由潜在生物基质(通常基于脑连接性)驱动的动态系统中进行时空交互。然而,当前方法严重简化了脑连接性之间的复杂关系,仅假设单一模态的脑连接组作为疾病传播的基质,这导致对病理传播的预测不准确,尤其是在长期进展阶段。同时,其他纯粹以数据驱动方式学习此类图结构的方法由于缺乏适当的约束而面临可识别性问题。因此,我们提出了一种新颖的框架,利用大语言模型(LLMs)作为区域变量相互作用的专家指导,以增强从非规则采样的纵向患者数据中学习疾病进展。通过利用LLMs合成多模态关系并整合多样化的疾病驱动机制的能力,我们的方法同时优化了:1)从个体层面观测数据构建长期疾病轨迹,以及2)具有更好可识别性的、捕获脑区之间相互作用的生物约束图结构。我们通过使用阿尔茨海默病队列的tau-PET成像数据估计病理传播来验证这一新方法。与传统方法相比,新框架在预测准确性和可解释性方面表现出优越性,同时揭示了超出传统连接性测量的额外疾病驱动因素。