精确率与召回率之间的最优排序分数是什么？我们总能找到它，且它很少是$F_1$ (What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$)

Ranking methods or models based on their performance is of prime importance but is tricky because performance is fundamentally multidimensional. In the case of classification, precision and recall are scores with probabilistic interpretations that are both important to consider and complementary. The rankings induced by these two scores are often in partial contradiction. In practice, therefore, it is extremely useful to establish a compromise between the two views to obtain a single, global ranking. Over the last fifty years or so,it has been proposed to take a weighted harmonic mean, known as the F-score, F-measure, or $F_β$. Generally speaking, by averaging basic scores, we obtain a score that is intermediate in terms of values. However, there is no guarantee that these scores lead to meaningful rankings and no guarantee that the rankings are good tradeoffs between these base scores. Given the ubiquity of $F_β$ scores in the literature, some clarification is in order. Concretely: (1) We establish that $F_β$-induced rankings are meaningful and define a shortest path between precision- and recall-induced rankings. (2) We frame the problem of finding a tradeoff between two scores as an optimization problem expressed with Kendall rank correlations. We show that $F_1$ and its skew-insensitive version are far from being optimal in that regard. (3) We provide theoretical tools and a closed-form expression to find the optimal value for $β$ for any distribution or set of performances, and we illustrate their use on six case studies.

翻译：基于性能对方法或模型进行排序至关重要，但由于性能本质上是多维的，这一过程具有挑战性。在分类任务中，精确率和召回率是具有概率解释的评分指标，两者均需考虑且互为补充。由这两个分数诱导的排序常存在部分矛盾。因此，在实践中，建立两种视角间的折中以获得单一全局排序极为重要。过去约五十年间，学界提出采用加权调和平均数，即F分数、F度量或$F_β$。一般而言，通过对基础分数取平均，我们得到一个数值上居中的分数。然而，这无法保证这些分数能产生有意义的排序，也无法保证排序是这些基础分数间的良好折中。鉴于$F_β$分数在文献中的普遍性，有必要进行澄清。具体而言：(1) 我们证明$F_β$诱导的排序具有意义，并定义了精确率与召回率诱导排序之间的最短路径。(2) 我们将两个分数间的折中问题构建为用肯德尔秩相关系数表达的优化问题，并证明$F_1$及其偏斜不敏感版本在此方面远非最优。(3) 我们提供理论工具和闭式解，可为任意性能分布或集合找到$β$的最优值，并通过六个案例研究说明其应用。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【剑桥大学-算法手册】Advanced Algorithms, Artificial Intelligence

专知会员服务

36+阅读 · 2024年11月11日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【Alex Nowak-Vila博士论文】有理论保证的结构化预测， Structured Prediction with Theoretical Guarantees

专知会员服务

13+阅读 · 2022年3月15日

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

专知会员服务

37+阅读 · 2022年3月13日