We prove sharp dimension-free representation results for neural networks with $D$ ReLU layers under square loss for a class of functions $\mathcal{G}_D$ defined in the paper. These results capture the precise benefits of depth in the following sense: 1. The rates for representing the class of functions $\mathcal{G}_D$ via $D$ ReLU layers is sharp up to constants, as shown by matching lower bounds. 2. For each $D$, $\mathcal{G}_{D} \subseteq \mathcal{G}_{D+1}$ and as $D$ grows the class of functions $\mathcal{G}_{D}$ contains progressively less smooth functions. 3. If $D^{\prime} < D$, then the approximation rate for the class $\mathcal{G}_D$ achieved by depth $D^{\prime}$ networks is strictly worse than that achieved by depth $D$ networks. This constitutes a fine-grained characterization of the representation power of feedforward networks of arbitrary depth $D$ and number of neurons $N$, in contrast to existing representation results which either require $D$ growing quickly with $N$ or assume that the function being represented is highly smooth. In the latter case similar rates can be obtained with a single nonlinear layer. Our results confirm the prevailing hypothesis that deeper networks are better at representing less smooth functions, and indeed, the main technical novelty is to fully exploit the fact that deep networks can produce highly oscillatory functions with few activation functions.
翻译:我们证明,对于在平方损失的某类功能中,ReLU层值为$D$的神经网络来说,我们得到了清晰的无维度代表结果。对于每类功能而言,这些结果体现了精确的深度效益 $\mathcal{G ⁇ D$通过$Dcal{G ⁇ D$通过$D$ReLU 平面匹配显示的是常数。对于每类功能中损失的ReLU层值为$D$,ReLU层值为$mathcal{G ⁇ D+1},而每类功能中损失的值为$\mathcal{G{G ⁇ D}。这些结果反映了准确的深度功能类别 $mathcal{G ⁇ D$的准确度。 如果$Dprime} <D$,那么通过深度匹配的网络值为$D+D$D$的近似率,那么现在的直线线值代表着高端网络代表了高端网络值, 美元和正层值代表的直方函数的快速递增量。