The rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. In particular, the assumption of low-rank feature representations leads to algorithmic developments in many architectures. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear. To fill this gap, we perform a rigorous study on the behavior of network rank, focusing particularly on the notion of rank deficiency. We theoretically establish a universal monotonic decreasing property of network rank from the basic rules of differential and algebraic composition, and uncover rank deficiency of network blocks and deep function coupling. By virtue of our numerical tools, we provide the first empirical analysis of the per-layer behavior of network rank in practical settings, i.e., ResNets, deep MLPs, and Transformers on ImageNet. These empirical results are in direct accord with our theory. Furthermore, we reveal a novel phenomenon of independence deficit caused by the rank deficiency of deep networks, where classification confidence of a given category can be linearly decided by the confidence of a handful of other categories. The theoretical results of this work, together with the empirical findings, may advance understanding of the inherent principles of deep neural networks.
翻译:神经网络的等级测量跨层流来的信息。 它是一个关键结构条件的例子, 适用于机器学习的广泛领域。 特别是, 假设低级别地物表现导致许多结构的算法发展。 但是, 神经网络的内在机制仍然模糊不清。 为了填补这一空白, 我们对网络等级的行为进行严格研究, 特别侧重于排名不足的概念。 我们理论上从差异和代数构成的基本规则中, 建立起一个普遍的单一性下降网络等级属性, 并发现网络区块的等级缺陷和深层功能结合。 我们借助我们的数字工具, 首次对实际环境中的网络等级的一层行为进行了实证分析, 即 ResNets, 深层 MLPs 和图像网络上的变异器。 这些经验结果与我们的理论直接一致。 此外, 我们揭示了一种由深层次网络的等级缺陷造成的新的独立赤字现象, 在一个特定类别的分类信任可以由少数其它类别的信任线性决定。 我们的理论分析结果, 以及这一内在的经验结论, 以及这一网络的内在结果, 以及深层次的发现, 以及我们理论结果, 可能与我们理论结果相近的发现。