Eye tracking data quantifies the attentional bias towards negative stimuli that is frequently observed in depressed groups. Audio and video data capture the affective flattening and psychomotor retardation characteristic of depression. Statistical validation confirmed their significant discriminative power in distinguishing depressed from non depressed groups. We address a critical limitation of existing graph-based models that focus on low-frequency information and propose a Multi-Frequency Graph Convolutional Network (MF-GCN). This framework consists of a novel Multi-Frequency Filter Bank Module (MFFBM), which can leverage both low and high frequency signals. Extensive evaluation against traditional machine learning algorithms and deep learning frameworks demonstrates that MF-GCN consistently outperforms baselines. In binary (depressed and non depressed) classification, the model achieved a sensitivity of 0.96 and F2 score of 0.94. For the 3 class (no depression, mild to moderate depression and severe depression) classification task, the proposed method achieved a sensitivity of 0.79 and specificity of 0.87 and siginificantly suprassed other models. To validate generalizability, the model was also evaluated on the Chinese Multimodal Depression Corpus (CMDC) dataset and achieved a sensitivity of 0.95 and F2 score of 0.96. These results confirm that our trimodal, multi frequency framework effectively captures cross modal interaction for accurate depression detection.
翻译:眼动追踪数据量化了抑郁群体中常见的对负面刺激的注意偏向。音频与视频数据捕捉了抑郁症特有的情感平淡和精神运动性迟滞。统计验证证实了它们在区分抑郁与非抑郁群体方面具有显著的判别力。我们针对现有基于图的模型主要关注低频信息的局限性,提出了一种多频图卷积网络(MF-GCN)。该框架包含一个新颖的多频滤波器组模块(MFFBM),能够同时利用低频与高频信号。与传统机器学习算法及深度学习框架的广泛对比评估表明,MF-GCN 始终优于基线模型。在二分类(抑郁与非抑郁)任务中,该模型实现了 0.96 的敏感度和 0.94 的 F2 分数。在三分类(无抑郁、轻中度抑郁与重度抑郁)任务中,所提方法取得了 0.79 的敏感度和 0.87 的特异度,显著优于其他模型。为验证泛化能力,该模型还在中文多模态抑郁语料库(CMDC)数据集上进行了评估,获得了 0.95 的敏感度和 0.96 的 F2 分数。这些结果证实了我们的三模态多频框架能有效捕捉跨模态交互,实现精准的抑郁症检测。