We introduce a principled approach to detecting out-of-distribution (OOD) data by exploiting a connection to data curation. In data curation, we exclude ambiguous or difficult-to-classify input points from the dataset, and these excluded points are by definition OOD. We can therefore obtain the likelihood for OOD points by using a principled generative model of data-curation initially developed to explain the cold-posterior effect in Bayesian neural networks (Aitchison 2020). This model gives higher OOD probabilities when predictive uncertainty is higher and can be trained using maximum-likelihood jointly over the in-distribution and OOD points. This approach gives superior performance to past methods that did not provide a probability for OOD points, and therefore could not be trained using maximum-likelihood.
翻译:我们采用了一种原则性方法,通过利用与数据整理的连接来探测分配外(OOD)数据。在数据整理中,我们排除了数据集中含混或难以分类的输入点,这些排除点的定义是OOD。因此,我们可以通过最初为解释Bayesian神经网络中的冷处效应而开发的有原则性数据整理模型(Aitchison 2020年)获得OOD点的可能性。当预测性不确定性较高时,这种模型提供了更高的OOOD概率,并且可以在分布和OOOD点上联合使用最大相似度的培训。这种方法使以往没有提供OOD点可能性的方法取得优异性,因此无法使用最大相似度来培训。