Regression tasks, notably in safety-critical domains, require proper uncertainty quantification, yet the literature remains largely classification-focused. In this light, we introduce a family of measures for total, aleatoric, and epistemic uncertainty based on proper scoring rules, with a particular emphasis on kernel scores. The framework unifies several well-known measures and provides a principled recipe for designing new ones whose behavior, such as tail sensitivity, robustness, and out-of-distribution responsiveness, is governed by the choice of kernel. We prove explicit correspondences between kernel-score characteristics and downstream behavior, yielding concrete design guidelines for task-specific measures. Extensive experiments demonstrate that these measures are effective in downstream tasks and reveal clear trade-offs among instantiations, including robustness and out-of-distribution detection performance.
翻译:回归任务,特别是在安全关键领域,需要进行恰当的不确定性量化,然而现有文献仍主要集中于分类问题。为此,我们基于严格评分规则(尤其侧重核评分)提出了一套用于度量总体、偶然性和认知不确定性的方法族。该框架统一了多种经典度量指标,并为设计新度量提供了原则性方案,其行为特征(如尾部敏感性、鲁棒性及分布外响应性)由核函数的选择决定。我们证明了核评分特性与下游任务行为之间的显式对应关系,从而为任务特异性度量提供了具体的设计准则。大量实验表明,这些度量在下游任务中表现有效,并揭示了不同实例化方案(包括鲁棒性与分布外检测性能)之间的明确权衡关系。