In many machine learning applications, we are faced with incomplete datasets. In the literature, missing data imputation techniques have been mostly concerned with filling missing values. However, the existence of missing values is synonymous with uncertainties not only over the distribution of missing values but also over target class assignments that require careful consideration. The objectives of this paper are twofold. First, we proposed a method for generating imputations from the conditional distribution of missing values given observed values. Second, we use the generated samples to estimate the distribution of target assignments given incomplete data. In order to generate imputations, we train a simple and effective generator network to generate imputations that a discriminator network is tasked to distinguish. Following this, a predictor network is trained using imputed samples from the generator network to capture the classification uncertainties and make predictions accordingly. The proposed method is evaluated on CIFAR-10 image dataset as well as two real-world tabular classification datasets, under various missingness rates and structures. Our experimental results show the effectiveness of the proposed method in generating imputations, as well as providing estimates for the class uncertainties in a classification task when faced with missing values.
翻译:在许多机器学习应用程序中,我们面临着不完整的数据集。在文献中,缺失的数据估算技术大多与填补缺失的值有关。然而,缺失的值的存在不仅与缺失值分布的不确定性同义,而且与需要仔细考虑的目标类别任务同义。本文的目标有两个。首先,我们提出了从有条件分配的缺失值中产生估算的方法。第二,我们利用生成的样本来估计目标任务分布,但数据不完整。为了产生估算,我们培训了一个简单有效的发电机网络,以产生一个区分器网络所负责的估算值。随后,利用发电机网络的估算样本对预测网络进行了培训,以捕捉分类不确定性,并据此作出预测。拟议的方法是在各种缺失率和结构下对CIFAR-10图像数据集以及两个真实世界的列表分类数据集进行评价。我们的实验结果表明,拟议的估算方法在生成估算值方面是有效的,并在面临缺失值时提供分类任务中的类别不确定性估计数。