For machine learning task, lacking sufficient samples mean the trained model has low confidence to approach the ground truth function. Until recently, after the generative adversarial networks (GAN) had been proposed, we see the hope of small samples data augmentation (DA) with realistic fake data, and many works validated the viability of GAN-based DA. Although most of the works pointed out higher accuracy can be achieved using GAN-based DA, some researchers stressed that the fake data generated from GAN has inherent bias, and in this paper, we explored when the bias is so low that it cannot hurt the performance, we set experiments to depict the bias in different GAN-based DA setting, and from the results, we design a pipeline to inspect specific dataset is efficiently-augmentable with GAN-based DA or not. And finally, depending on our trial to reduce the bias, we proposed some advice to mitigate bias in GAN-based DA application.
翻译:对于机器学习任务而言,缺乏足够的样本意味着经过培训的模型对于接近地面真相功能的信心很低。 直到最近,在提出了基因对抗网络(GAN)之后,我们看到了拥有现实假数据的小型样本数据增强(DA)的希望,而且许多工程证实了GAN的DA的可行性。 尽管大多数工程都指出,使用GAN的DA可以实现更高的准确性,但一些研究人员强调,从GAN产生的假数据具有内在的偏差。 在本文中,当偏差如此之低以致无法伤害性能时,我们进行了探索。 我们设置了实验,以描述基于GAN的不同DA设置中的偏差,并从结果中描绘出我们设计了一条管道来检查具体的数据集是否可与GAN的DA有效放大。 最后,根据我们的试验来减少偏差,我们提出了一些建议,以减轻基于GAN的DA应用程序中的偏差。