Using a mean-field theory of signal propagation, we analyze the evolution of correlations between two signals propagating forward through a deep ReLU network with correlated weights. Signals become highly correlated in deep ReLU networks with uncorrelated weights. We show that ReLU networks with anti-correlated weights can avoid this fate and have a chaotic phase where the signal correlations saturate below unity. Consistent with this analysis, we find that networks initialized with anti-correlated weights can train faster (in a teacher-student setting) by taking advantage of the increased expressivity in the chaotic phase. Combining this with a previously proposed strategy of using an asymmetric initialization to reduce dead node probability, we propose an initialization scheme that allows faster training and learning than the best-known initializations.
翻译:使用信号传播的暗地理论, 我们分析通过深ReLU网络传播的两种信号之间相互关系的演变。 信号在深ReLU网络中变得高度相关, 且具有不相干重量。 我们显示, 具有抗碳相关重量的ReLU网络可以避免这一命运, 并且处于一个混乱的阶段, 信号的相通性会低于统一。 根据这一分析, 我们发现, 以抗碳相关重量初始化的网络( 在师生环境下)能够利用混乱阶段中日益明显的表现来更快地培训( 在教师- 学生环境下) 。 将信号与先前提出的使用非对称初始化来减少死节概率的战略结合起来, 我们提出一个初始化计划, 使得与最著名的初始化相比, 更快的培训和学习速度。