Bias-variance decompositions are widely used to understand the generalization performance of machine learning models. While the squared error loss permits a straightforward decomposition, other loss functions - such as zero-one loss or $L_1$ loss - either fail to sum bias and variance to the expected loss or rely on definitions that lack the essential properties of meaningful bias and variance. Recent research has shown that clean decompositions can be achieved for the broader class of Bregman divergences, with the cross-entropy loss as a special case. However, the necessary and sufficient conditions for these decompositions remain an open question. In this paper, we address this question by studying continuous, nonnegative loss functions that satisfy the identity of indiscernibles (zero loss if and only if the two arguments are identical), under mild regularity conditions. We prove that so-called $g$-Bregman or rho-tau divergences are the only such loss functions that have a clean bias-variance decomposition. A $g$-Bregman divergence can be transformed into a standard Bregman divergence through an invertible change of variables. This makes the squared Mahalanobis distance, up to such a variable transformation, the only symmetric loss function with a clean bias-variance decomposition. Consequently, common metrics such as $0$-$1$ and $L_1$ losses cannot admit a clean bias-variance decomposition, explaining why previous attempts have failed. We also examine the impact of relaxing the restrictions on the loss functions and how this affects our results.
翻译:偏差-方差分解被广泛用于理解机器学习模型的泛化性能。虽然平方误差损失允许直接分解,但其他损失函数——如0-1损失或$L_1$损失——要么无法将偏差和方差之和等于期望损失,要么依赖于缺乏有意义偏差和方差基本性质的定义。近期研究表明,对于更广泛的Bregman散度类(以交叉熵损失为特例)可以实现清晰的分解。然而,这些分解的充分必要条件仍是一个开放问题。本文通过研究满足不可区分性恒等式(当且仅当两个参数相同时损失为零)的连续非负损失函数,在温和的正则性条件下探讨了该问题。我们证明所谓的$g$-Bregman散度或rho-tau散度是唯一具有清晰偏差-方差分解的此类损失函数。$g$-Bregman散度可通过可逆变量变换转化为标准Bregman散度。这使得平方马氏距离(在变量变换意义下)成为唯一具有清晰偏差-方差分解的对称损失函数。因此,常见的度量如0-1损失和$L_1$损失不可能存在清晰的偏差-方差分解,这解释了先前尝试失败的原因。我们还考察了放松损失函数限制的影响及其对结论的修正作用。