Labeled data is a fundamental component in training supervised deep learning models for computer vision tasks. However, the labeling process, especially for ordinal image classification where class boundaries are often ambiguous, is prone to error and noise. Such label noise can significantly degrade the performance and reliability of machine learning models. This paper addresses the problem of detecting and correcting label noise in ordinal image classification tasks. To this end, a novel data-centric method called ORDinal Adaptive Correction (ORDAC) is proposed for adaptive correction of noisy labels. The proposed approach leverages the capabilities of Label Distribution Learning (LDL) to model the inherent ambiguity and uncertainty present in ordinal labels. During training, ORDAC dynamically adjusts the mean and standard deviation of the label distribution for each sample. Rather than discarding potentially noisy samples, this approach aims to correct them and make optimal use of the entire training dataset. The effectiveness of the proposed method is evaluated on benchmark datasets for age estimation (Adience) and disease severity detection (Diabetic Retinopathy) under various asymmetric Gaussian noise scenarios. Results show that ORDAC and its extended versions (ORDAC_C and ORDAC_R) lead to significant improvements in model performance. For instance, on the Adience dataset with 40% noise, ORDAC_R reduced the mean absolute error from 0.86 to 0.62 and increased the recall metric from 0.37 to 0.49. The method also demonstrated its effectiveness in correcting intrinsic noise present in the original datasets. This research indicates that adaptive label correction using label distributions is an effective strategy to enhance the robustness and accuracy of ordinal classification models in the presence of noisy data.
翻译:标注数据是训练用于计算机视觉任务的监督深度学习模型的基础组成部分。然而,标注过程,特别是在类别边界通常模糊的序数图像分类任务中,容易产生错误和噪声。此类标签噪声会显著降低机器学习模型的性能与可靠性。本文旨在解决序数图像分类任务中标签噪声的检测与校正问题。为此,我们提出了一种名为序数自适应校正(ORDAC)的新型数据中心化方法,用于对带噪标签进行自适应校正。所提出的方法利用标签分布学习(LDL)的能力来建模序数标签中固有的模糊性与不确定性。在训练过程中,ORDAC动态调整每个样本标签分布的均值与标准差。该方法并非丢弃可能带噪的样本,而是旨在校正它们并充分利用整个训练数据集。我们在年龄估计(Adience)和疾病严重程度检测(糖尿病视网膜病变)的基准数据集上,于多种非对称高斯噪声场景下评估了所提方法的有效性。结果表明,ORDAC及其扩展版本(ORDAC_C与ORDAC_R)显著提升了模型性能。例如,在含有40%噪声的Adience数据集上,ORDAC_R将平均绝对误差从0.86降低至0.62,并将召回率指标从0.37提升至0.49。该方法在修正原始数据集中存在的固有噪声方面也展现了其有效性。本研究表明,利用标签分布进行自适应标签校正是一种有效的策略,可在存在噪声数据的情况下增强序数分类模型的鲁棒性与准确性。