We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training. We show that the implicit regularization coming from training with Stochastic Gradient Descent with a high learning-rate and small batch size plays an important role in learning minimal sufficient representations for the task. In the process of arriving at a minimal sufficient representation, we find that the content of the representation changes dynamically during training. In particular, we find that semantically meaningful but ultimately irrelevant information is encoded in the early transient dynamics of training, before being later discarded. In addition, we evaluate how perturbing the initial part of training impacts the learning dynamics and the resulting representations. We show these effects on both perceptual decision-making tasks inspired by neuroscience literature, as well as on standard image classification tasks.
翻译:我们引入了深层网络所学代表中所含的可用信息的概念,并用它来研究培训期间如何产生最佳任务代表。我们表明,与高学习率和小批量规模的Sottachistic Gradient Emple培训产生的隐含的正规化,在为任务学习最低限度的充分代表方面起着重要作用。在达到最低限度的充分代表的过程中,我们发现代表性的内容在培训期间动态变化。特别是,我们发现,在培训的早期中转动态中,将具有内在意义但最终不相干的信息编码起来,然后被丢弃。此外,我们评估培训的初始部分对学习动态和由此产生的代表的影响。我们展示了这些对神经科学文献所激发的认知性决策任务的影响,以及对标准图像分类任务的影响。