图像的内在层面及其对学习的影响 (The Intrinsic Dimension of Images and Its Impact on Learning)

It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in computer vision. In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning. We find that common natural image datasets indeed have very low intrinsic dimension relative to the high number of pixels in the images. Additionally, we find that low dimensional datasets are easier for neural networks to learn, and models solving these tasks generalize better from training to test data. Along the way, we develop a technique for validating our dimension estimation tools on synthetic data generated by GANs allowing us to actively manipulate the intrinsic dimension by controlling the image generation process. Code for our experiments may be found here https://github.com/ppope/dimensions.

翻译：人们广泛认为,尽管传统像素表示方式具有高度的维度,但自然图像数据却呈现出低维结构。这一理念是计算机视觉深层学习取得显著成功的共同直觉的基础。在这项工作中,我们将维度估计工具应用于流行数据集,并调查低维结构在深层学习中的作用。我们发现,与图像中大量像素相比,普通的自然图像数据集确实具有非常低的内在维度。此外,我们发现,低维数据集更容易让神经网络学习,而解决这些任务的模型从培训到测试数据,则更加普遍化。与此同时,我们开发了一种技术,用以验证我们对GANs产生的合成数据的维度估计工具,从而使我们能够通过控制图像生成过程来积极操纵内在维度。我们实验的代码可以在这里找到 https://github.com/ppope/dimensions。