Neural scaling laws--power-law relationships between generalization errors and characteristics of deep learning models--are vital tools for developing reliable models while managing limited resources. Although the success of large language models highlights the importance of these laws, their application to deep regression models remains largely unexplored. Here, we empirically investigate neural scaling laws in deep regression using a parameter estimation model for twisted van der Waals magnets. We observe power-law relationships between the loss and both training dataset size and model capacity across a wide range of values, employing various architectures--including fully connected networks, residual networks, and vision transformers. Furthermore, the scaling exponents governing these relationships range from 1 to 2, with specific values depending on the regressed parameters and model details. The consistent scaling behaviors and their large scaling exponents suggest that the performance of deep regression models can improve substantially with increasing data size.
翻译:神经缩放定律——即深度学习模型的泛化误差与其特征之间的幂律关系——是在有限资源条件下开发可靠模型的重要工具。尽管大语言模型的成功凸显了这些定律的重要性,但它们在深度回归模型中的应用仍很大程度上未被探索。本文通过扭曲范德华磁体的参数估计模型,对深度回归中的神经缩放定律进行了实证研究。我们观察到,在广泛的数值范围内,损失与训练数据集大小及模型容量之间均存在幂律关系,并采用了多种架构——包括全连接网络、残差网络和视觉Transformer。此外,控制这些关系的缩放指数范围在1到2之间,具体数值取决于回归参数和模型细节。一致的缩放行为及其较大的缩放指数表明,深度回归模型的性能可以随着数据规模的增加而显著提升。