Dropout is a simple but efficient regularization technique for achieving better generalization of deep neural networks (DNNs); hence it is widely used in tasks based on DNNs. During training, dropout randomly discards a portion of the neurons to avoid overfitting. This paper presents an enhanced dropout technique, which we call multi-sample dropout, for both accelerating training and improving generalization over the original dropout. The original dropout creates a randomly selected subset (called a dropout sample) from the input in each training iteration while the multi-sample dropout creates multiple dropout samples. The loss is calculated for each sample, and then the sample losses are averaged to obtain the final loss. This technique can be easily implemented without implementing a new operator by duplicating a part of the network after the dropout layer while sharing the weights among the duplicated fully connected layers. Experimental results showed that multi-sample dropout significantly accelerates training by reducing the number of iterations until convergence for image classification tasks using the ImageNet, CIFAR-10, CIFAR-100, and SVHN datasets. Multi-sample dropout does not significantly increase computation cost per iteration because most of the computation time is consumed in the convolution layers before the dropout layer, which are not duplicated. Experiments also showed that networks trained using multi-sample dropout achieved lower error rates and losses for both the training set and validation set.
翻译:辍学是一种简单而高效的正规化技术,目的是更好地普及深神经网络;因此,它被广泛用于基于DNN的工作。 在培训过程中,辍学者随机抛弃一部分神经元以避免过度装配。本文展示了一种强化的辍学技术,我们称之为多模版辍学,以加快培训和改进对原始辍学的概括。最初的辍学者从每次培训循环中输入的信息中随机选择了一个子集(称为辍学样本),而多种抽样的辍学者则产生多个辍学者样本。每个样本的损失是平均计算的,然后样本损失是平均的,以获得最终损失。在培训过程中,不执行新的操作者,在辍学层之后复制网络的一部分内容以避免过度装配。我们称之为多模版的辍学者辍学者,同时在完全相连的层中共享重力。实验结果表明,多模版的辍学者极大地加快了培训速度,通过图像网络、CIFAR-10、CIFAR-100和SVHN数据集的趋同,将损失计算为平均损失。多样式的辍学者在测试阶段中不会显著地增加升级率。