Improving existing neural network architectures can involve several design choices such as manipulating the loss functions, employing a diverse learning strategy, exploiting gradient evolution at training time, optimizing the network hyper-parameters, or increasing the architecture depth. The latter approach is a straightforward solution, since it directly enhances the representation capabilities of a network; however, the increased depth generally incurs in the well-known vanishing gradient problem. In this paper, borrowing from different methods addressing this issue, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through interlaced auto-encoders, and further refines the base network architecture by means of skip and residual connections. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset; where the SIRe-extended architectures achieve significantly increased performances across all models, thus confirming the presented approach effectiveness.
翻译:改善现有的神经网络结构可能涉及若干设计选择,例如操纵损失功能、采用不同的学习战略、利用培训时的梯度演化、优化网络超参数或提高结构深度,后者是一个直截了当的解决办法,因为它直接提高了网络的代表性能力;然而,众所周知的消失梯度问题通常会产生更大的深度。在本文件中,从解决这一问题的不同方法中借款,我们采用了一个相互交织的多任务学习战略,定义了SIRe,以减少与物体分类任务有关的渐变梯度。提出的方法直接改进了一个卷变神经网络,通过相互连接的自动电解码器执行输入图像结构的维护,并通过跳动和剩余连接进一步改进基本网络结构。为了验证所提出的方法,通过SIRE战略扩展了简单的CNN和有名网络的各种实施方式,并在CIFAR100数据集上进行了广泛测试;在SRE扩展的结构中,所有模型都取得了显著提高的性能,从而证实了所提出的方法的有效性。