Flat minima are widely believed to correlate with improved generalisation in deep neural networks. However, this connection has proven more nuanced in recent studies, with both theoretical counterexamples and empirical exceptions emerging in the literature. In this paper, we revisit the role of sharpness in model performance, proposing that sharpness is better understood as a function-dependent property rather than a reliable indicator of poor generalisation. We conduct extensive empirical studies, from single-objective optimisation to modern image classification tasks, showing that sharper minima often emerge when models are regularised (e.g., via SAM, weight decay, or data augmentation), and that these sharp minima can coincide with better generalisation, calibration, robustness, and functional consistency. Across a range of models and datasets, we find that baselines without regularisation tend to converge to flatter minima yet often perform worse across all safety metrics. Our findings demonstrate that function complexity, rather than flatness alone, governs the geometry of solutions, and that sharper minima can reflect more appropriate inductive biases (especially under regularisation), calling for a function-centric reappraisal of loss landscape geometry.
翻译:平坦极小值被广泛认为与深度神经网络中改进的泛化能力相关。然而,近期研究表明这种关联更为微妙,文献中既出现了理论反例,也涌现了经验性例外。本文重新审视了尖锐度在模型性能中的作用,提出尖锐度应被更好地理解为一种函数依赖属性,而非泛化能力差的可靠指标。我们进行了广泛的实证研究,从单目标优化到现代图像分类任务,结果表明当模型被正则化时(例如通过SAM、权重衰减或数据增强),往往会出现更尖锐的极小值,并且这些尖锐极小值可能与更好的泛化性、校准性、鲁棒性及函数一致性共存。在多种模型和数据集上,我们发现未正则化的基线方法倾向于收敛到更平坦的极小值,但在所有安全性指标上往往表现更差。我们的研究结果表明,决定解空间几何形态的是函数复杂度而非单纯的平坦度,且更尖锐的极小值可能反映了更合适的归纳偏置(尤其在正则化条件下),这要求我们从函数中心的视角重新评估损失景观的几何特性。