We contribute to the study of the quality of learned representations. In many domains, an important evaluation criterion for safe and trustworthy deep learning is how well the invariances captured by representations of deep neural networks (DNNs) are shared with humans. We identify challenges in measuring these invariances. Prior works used gradient-based methods to generate \textit{identically represented inputs} (IRIs), \ie, inputs which have similar representations (on a given layer) of a neural network. If these IRIs look `similar' to humans then a neural network's learned invariances are said to align with human perception. However, we show that prior studies on the alignment of invariances between DNNs and humans are `biased' by the specific loss function used to generate IRIs. We show how different loss functions can lead to different takeaways about a model's shared invariances with humans. We show that under an \textit{adversarial} IRI~generation process all models appear to have very little shared invariance with humans. We conduct an in-depth investigation of how different components of the deep learning pipeline contribute to learning models that have good alignment with human's invariances. We find that architectures with residual connections trained using a self-supervised contrastive loss with $\ell_p$ ball adversarial data augmentation tend to learn the most human-like invariances.
翻译:在许多领域,安全和值得信赖的深层次学习的一个重要评价标准是深神经网络(DNNs)的表达方式所捕捉的变异与人类共享的程度。我们发现测量这些变异方面存在的挑战。以前的工作使用了基于梯度的方法来生成一个神经网络(在给定层次上)的类似表达方式(在给定层次上),投入是不同的。如果这些IRIs与人类具有相似的表达方式(在给定层次上),那么一个安全和可信的深层次学习的重要评价标准就是由深神经网络(DNNNS)的表达方式所捕捉的变异性与人类的感知。然而,我们显示以前关于DNNS和人类之间的变异性一致性的研究在测量这些变异性时会因产生IRIs的具体损失功能而“有偏差”。我们展示不同的损失功能如何导致不同的结果,即模型(在给人以相同的变异性上)。我们显示,在类似性对立性(IRI) 生成过程看来,所有模型都很少与人类的变异性发生相同的反差性。我们通过深层次的研究来学习模型学习模型。我们在学会学会学会学会学会的自我校正校正。