用于改进对集群数据的可解释性、性能和一般化的深层学习(ARMED)模型 (Adversarially-regularized mixed effects deep learning (ARMED) models for improved interpretability, performance, and generalization on clustered data)

Natural science datasets frequently violate assumptions of independence. Samples may be clustered (e.g. by study site, subject, or experimental batch), leading to spurious associations, poor model fitting, and confounded analyses. While largely unaddressed in deep learning, this problem has been handled in the statistics community through mixed effects models, which separate cluster-invariant fixed effects from cluster-specific random effects. We propose a general-purpose framework for Adversarially-Regularized Mixed Effects Deep learning (ARMED) models through non-intrusive additions to existing neural networks: 1) an adversarial classifier constraining the original model to learn only cluster-invariant features, 2) a random effects subnetwork capturing cluster-specific features, and 3) an approach to apply random effects to clusters unseen during training. We apply ARMED to dense, convolutional, and autoencoder neural networks on 4 applications including simulated nonlinear data, dementia prognosis and diagnosis, and live-cell image analysis. Compared to prior techniques, ARMED models better distinguish confounded from true associations in simulations and learn more biologically plausible features in clinical applications. They can also quantify inter-cluster variance and visualize cluster effects in data. Finally, ARMED improves accuracy on data from clusters seen during training (up to 28% vs. conventional models) and generalization to unseen clusters (up to 9% vs. conventional models).

翻译：自然科学数据集经常违反独立假设。样本可能(例如通过研究地点、主题或实验批量)聚集在一起,导致虚假的协会、不完善的模型安装和混乱分析。尽管在深层学习中基本上没有解决,但这一问题在统计界通过混合效应模型得到了处理,这些模型将聚类-异性固定效应与集束特有随机效应分开。我们提议了一个通用框架,用于通过对现有神经网络进行非侵入性补充的非干涉性补充的隔热性混合深层学习(ARMED)模型(1)一个对抗性分类器,限制原始模型仅学习集型异性特征的原始模型,2个随机效应次网络捕捉了集群的具体特征,3个在培训期间对看不见的群集应用随机效应。我们将ARMED应用于密集性、共振动性和自动电解神经网络的4种应用,包括模拟非线性数据、dementia prognis和诊断,以及实型图像分析。与以往技术相比,ARMED模型更好地区分了模拟中的真正联系,并学习了不同组别特征,在常规数据组别中学习了28项的常规数据分析。在常规组别中,在常规组别分析中,它们可以量化中可以比较。在常规组别中,它们可以比较。在常规组别中可以比较。在常规组别中,在常规组别中,在常规组别中,在常规组别中,在常规组别中可以比较。在常规组别中可以比较。在常规组别中可以比较。在常规组别中,在常规组别中,在常规组别中,在常规组别中,在常规组别中,在常规组别中,在常规组别中,在常规组别中,它们。