Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities, offering potential biomarkers for early detection. Despite the promise of multi-modal approaches, challenges like multi-lingual generalization and the absence of a unified evaluation framework persist. To address these gaps, we propose FEND (Foundation model-based Evaluation of Neuropsychiatric Disorders), a comprehensive multi-modal framework integrating speech and text modalities for detecting AD, depression, and ASD across the lifespan. Leveraging 13 multi-lingual datasets spanning English, Chinese, Greek, French, and Dutch, we systematically evaluate multi-modal fusion performance. Our results show that multi-modal fusion excels in AD and depression detection but underperforms in ASD due to dataset heterogeneity. We also identify modality imbalance as a prevalent issue, where multi-modal fusion fails to surpass the best mono-modal models. Cross-corpus experiments reveal robust performance in task- and language-consistent scenarios but noticeable degradation in multi-lingual and task-heterogeneous settings. By providing extensive benchmarks and a detailed analysis of performance-influencing factors, FEND advances the field of automated, lifespan-inclusive, and multi-lingual neuropsychiatric disorder assessment. We encourage researchers to adopt the FEND framework for fair comparisons and reproducible research.
翻译:神经精神疾病,如阿尔茨海默病(AD)、抑郁症和自闭症谱系障碍(ASD),通常表现出语言和声学异常,这些特征为早期检测提供了潜在的生物标志物。尽管多模态方法前景广阔,但多语言泛化能力不足以及缺乏统一的评估框架等挑战依然存在。为弥补这些不足,我们提出了FEND(基于基础模型的神经精神疾病评估),这是一个全面的多模态框架,整合了语音和文本模态,用于检测覆盖全生命周期的AD、抑郁症和ASD。利用涵盖英语、汉语、希腊语、法语和荷兰语的13个多语言数据集,我们系统评估了多模态融合的性能。我们的结果表明,多模态融合在AD和抑郁症检测中表现出色,但由于数据集异质性,在ASD检测中表现欠佳。我们还发现模态不平衡是一个普遍问题,多模态融合未能超越最佳的单模态模型。跨语料库实验表明,在任务和语言一致的情况下性能稳健,但在多语言和任务异构场景中性能明显下降。通过提供广泛的基准测试以及对性能影响因素的详细分析,FEND推动了自动化、全生命周期涵盖和多语言的神经精神疾病评估领域的发展。我们鼓励研究人员采用FEND框架进行公平比较和可重复性研究。