Constructing reliable prediction sets is an obstacle for applications of neural models: Distribution-free conditional coverage is theoretically impossible, and the exchangeability assumption underpinning the coverage guarantees of standard split-conformal approaches is violated on domain shifts. Given these challenges, we propose and analyze a data-driven procedure for obtaining empirically reliable approximate conditional coverage, calculating unique quantile thresholds for each label for each test point. We achieve this via the strong signals for prediction reliability from KNN-based model approximations over the training set and approximations over constrained samples from the held-out calibration set. We demonstrate the potential for substantial (and otherwise unknowable) under-coverage with split-conformal alternatives with marginal coverage guarantees when not taking these distances and constraints into account with protein secondary structure prediction, grammatical error detection, sentiment classification, and fact verification, covering supervised sequence labeling, zero-shot sequence labeling (i.e., feature detection), document classification (with sparsity/interpretability constraints), and retrieval-classification, including class-imbalanced and domain-shifted settings.
翻译:建立可靠的预测数据集是运用神经模型的一个障碍:在理论上,无分配条件的有条件覆盖在理论上是不可能的,标准不同形式方法的保障覆盖面所依据的互换性假设在域变中被违反。鉴于这些挑战,我们提议并分析一种数据驱动程序,以获得经验上可靠的近似有条件覆盖,计算每个测试点每个标签的独特量化阈值。我们通过基于KNN的模型近似对培训数据集的预测可靠性的强烈信号和对持有校准集的受限样本的近似,实现这一点。我们展示了在不考虑蛋白质二次结构预测、语法误差检测、情绪分类和事实核实时,在不考虑这些距离和限制的情况下,存在具有边际覆盖保证的多种非正规替代方法的大规模(或无法识别的)覆盖不足的可能性,包括蛋白质二次结构预测、语法误差检测、情绪分类和事实核实,涵盖受监督的序列标签、零速序序列标签(即特征检测)、文件分类(带有磁性/易读性制约)和检索分类,包括等级平衡和域变位环境。