Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel signals into fixed regions of interest, reducing data dimensionality but discarding fine-grained spatial details, and requiring extremely large cohorts to train effectively as general-purpose foundation models. Atlas-free methods, on the other hand, operate directly on voxel-level information - preserving spatial fidelity but are prohibitively memory- and compute-intensive, making large-scale pre-training infeasible. We introduce SLIM-Brain (Sample-efficient, Low-memory fMRI Foundation Model for Human Brain), a new atlas-free foundation model that simultaneously improves both data- and training-efficiency. SLIM-Brain adopts a two-stage adaptive design: (i) a lightweight temporal extractor captures global context across full sequences and ranks data windows by saliency, and (ii) a 4D hierarchical encoder (Hiera-JEPA) learns fine-grained voxel-level representations only from the top-$k$ selected windows, while deleting about 70% masked patches. Extensive experiments across seven public benchmarks show that SLIM-Brain establishes new state-of-the-art performance on diverse tasks, while requiring only 4 thousand pre-training sessions and approximately 30% of GPU memory comparing to traditional voxel-level methods.
翻译:基础模型正成为fMRI分析的一种强大范式,但当前方法面临数据效率和训练效率的双重瓶颈。基于图谱的方法将体素信号聚合到固定的感兴趣区域,降低了数据维度,但丢弃了细粒度的空间细节,并且需要极大规模的队列才能有效训练为通用基础模型。另一方面,无图谱方法直接在体素级信息上操作——保留了空间保真度,但其内存和计算需求极高,使得大规模预训练难以实现。我们提出了SLIM-Brain(面向人脑的样本高效、低内存fMRI基础模型),这是一种新的无图谱基础模型,能同时提升数据效率和训练效率。SLIM-Brain采用两阶段自适应设计:(i)一个轻量级时序提取器捕获完整序列的全局上下文,并根据显著性对数据窗口进行排序;(ii)一个4D分层编码器(Hiera-JEPA)仅从选定的top-$k$窗口中学习细粒度的体素级表征,同时删除约70%的掩码补丁。在七个公共基准上进行的大量实验表明,SLIM-Brain在多种任务上建立了新的最先进性能,同时仅需4千个预训练会话,并且与传统体素级方法相比,仅需约30%的GPU内存。