When one observes a sequence of variables $(x_1, y_1), \ldots, (x_n, y_n)$, Conformal Prediction (CP) is a methodology that allows to estimate a confidence set for $y_{n+1}$ given $x_{n+1}$ by merely assuming that the distribution of the data is exchangeable. CP sets have guaranteed coverage for any finite population size $n$. While appealing, the computation of such a set turns out to be infeasible in general, e.g. when the unknown variable $y_{n+1}$ is continuous. The bottleneck is that it is based on a procedure that readjusts a prediction model on data where we replace the unknown target by all its possible values in order to select the most probable one. This requires computing an infinite number of models, which often makes it intractable. In this paper, we combine CP techniques with classical algorithmic stability bounds to derive a prediction set computable with a single model fit. We demonstrate that our proposed confidence set does not lose any coverage guarantees while avoiding the need for data splitting as currently done in the literature. We provide some numerical experiments to illustrate the tightness of our estimation when the sample size is sufficiently large, on both synthetic and real datasets.
翻译:当人们观察一系列变量$(x_1, y_1),\ldots,(x_n, y_n)$,(x_n, y_n)$, 共预测(CP)是一种方法,可以仅仅假设数据的分布是可以互换的,从而估计对美元+1美元(美元+1美元)的置信度,而美元是美元++1美元。CP的设置保证了任何有限的人口规模的覆盖。在有吸引力的情况下,这种集的计算在总体上是行不通的,例如,当未知变量$y ⁇ n+1美元是连续的时。 瓶颈在于它所依据的程序是,在用所有可能的值来取代未知目标以选择最有可能的值来重新校正数据预测模型。 这需要计算出无限数量的模型, 而这往往使得它变得难以操作。 在本文中, 我们将纯正的算法稳定性技术结合到一个与单一模型相容的预测。 我们证明我们提议的置信度数据集不会失去任何覆盖的保证, 同时避免在避免数据在目前进行精确的实验时, 我们提供精确的实验时, 我们提供真实的模型。