自我监督学习的不确定性和强力性基准 (Benchmark for Uncertainty & Robustness in Self-Supervised Learning)

from arxiv, 15 pages, 3 tables, 6 figures, the class project in CSCI 601.771: Self-supervised Statistical Models - Johns Hopkins University - Fall 2022

Self-Supervised Learning (SSL) is crucial for real-world applications, especially in data-hungry domains such as healthcare and self-driving cars. In addition to a lack of labeled data, these applications also suffer from distributional shifts. Therefore, an SSL method should provide robust generalization and uncertainty estimation in the test dataset to be considered a reliable model in such high-stakes domains. However, existing approaches often focus on generalization, without evaluating the model's uncertainty. The ability to compare SSL techniques for improving these estimates is therefore critical for research on the reliability of self-supervision models. In this paper, we explore variants of SSL methods, including Jigsaw Puzzles, Context, Rotation, Geometric Transformations Prediction for vision, as well as BERT and GPT for language tasks. We train SSL in auxiliary learning for vision and pre-training for language model, then evaluate the generalization (in-out classification accuracy) and uncertainty (expected calibration error) across different distribution covariate shift datasets, including MNIST-C, CIFAR-10-C, CIFAR-10.1, and MNLI. Our goal is to create a benchmark with outputs from experiments, providing a starting point for new SSL methods in Reliable Machine Learning. All source code to reproduce results is available at https://github.com/hamanhbui/reliable_ssl_baselines.

翻译：自强学习(SSL)对于现实世界的应用至关重要,特别是在保健和自驾驶汽车等数据饥饿领域。除了缺少标签数据外,这些应用还存在分布变化。因此,SSL方法应在测试数据集中提供有力的概括和不确定性估计,以在此类高取域中被视为可靠的模型。然而,现有方法往往侧重于概括化,而没有评估模型的不确定性。因此,比较SSL技术来改进这些估计数的能力对于自我监督模型的可靠性研究至关重要。在本文中,我们探讨SSL方法的变异,包括吉格锯图案、上下文、轮廓、几何变图案预测,以及用于语言任务的BERT和GPT。我们培训SSL为视觉和语言模型预培训提供辅助学习,然后评估不同分布式数据集的概括化(内部分类准确性)和不确定性(预期校准错误),包括MNIST-C、CIFAR-10-C、用于愿景的几何变形转换方法,以及用于SMAR-10和SMLSLA的所有基准数据源。我们开始的SAR-10和SBRMMR 数据是一个新的基准数据源。