Conformal prediction provides a framework for uncertainty quantification, specifically in the forms of prediction intervals and sets with distribution-free guaranteed coverage. While recent cross-conformal techniques such as CV+ and Jackknife+-after-bootstrap achieve better data efficiency than traditional split conformal methods, they incur substantial computational costs due to required pairwise comparisons between training and test samples' out-of-bag scores. Observing that these methods naturally extend from ensemble models, particularly random forests, we leverage existing optimized random forest implementations to enable efficient cross-conformal predictions. We present coverforest, a Python package that implements efficient conformal prediction methods specifically optimized for random forests. coverforest supports both regression and classification tasks through various conformal prediction methods, including split conformal, CV+, Jackknife+-after-bootstrap, and adaptive prediction sets. Our package leverages parallel computing and Cython optimizations to speed up out-of-bag calculations. Our experiments demonstrate that coverforest's predictions achieve the desired level of coverage. In addition, its training and prediction times can be faster than an existing implementation by 2--9 times. The source code for the coverforest is hosted on GitHub at https://github.com/donlap/coverforest.
翻译:共形预测为不确定性量化提供了一个框架,具体表现为具有分布无关保证覆盖率的预测区间和集合。虽然近期的交叉共形技术(如CV+和Jackknife+-after-bootstrap)相比传统的分割共形方法实现了更高的数据效率,但由于需要计算训练样本与测试样本的袋外分数之间的两两比较,这些方法会产生显著的计算开销。我们观察到这些方法天然适用于集成模型(尤其是随机森林),因此利用现有优化的随机森林实现来支持高效的交叉共形预测。本文提出coverforest——一个专门针对随机森林优化的高效共形预测方法Python工具包。coverforest通过多种共形预测方法(包括分割共形、CV+、Jackknife+-after-bootstrap和自适应预测集)支持回归和分类任务。该工具包利用并行计算和Cython优化加速袋外计算。实验表明,coverforest的预测能达到期望的覆盖率水平,其训练和预测速度可比现有实现快2-9倍。coverforest的源代码托管于GitHub:https://github.com/donlap/coverforest。