This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21 / SIDMA'23) conjectured that this result is optimal in the sense that there are CPWL functions on $\mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $\lceil\log_3(n-1)\rceil+1$ hidden layers are sufficient to compute all CPWL functions on $\mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $\lceil\log_3(n-2)\rceil+1$ hidden layers are sufficient to compute the maximum of $n\geq 4$ numbers. Our constructions almost match the $\lceil\log_3(n)\rceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes.
翻译:本研究探讨了ReLU神经网络的表达能力,重点关注其深度。先前一系列研究表明,$\lceil \log_2(n+1) \rceil$个隐藏层足以计算$\mathbb{R}^n$上所有连续分段线性(CPWL)函数。Hertrich、Basu、Di Summa和Skutella(NeurIPS'21 / SIDMA'23)猜想该结果是最优的,即存在$\mathbb{R}^n$上的CPWL函数(如最大值函数)需要达到该深度。我们否定了这一猜想,并证明$\lceil\log_3(n-1)\rceil+1$个隐藏层足以计算$\mathbb{R}^n$上所有CPWL函数。证明的关键步骤在于:具有两个隐藏层的ReLU神经网络可以精确表示五个输入的最大值函数。更一般地,我们证明$\lceil\log_3(n-2)\rceil+1$个隐藏层足以计算$n\geq 4$个数字的最大值。我们的构造方法在特殊情况下(即权重为十进制分数的ReLU网络)几乎达到了Averkov、Hojny和Merkert(ICLR'25)提出的$\lceil\log_3(n)\rceil$下界。该构造可通过将单纯形细分为“更简单”多面体的多面体剖分进行几何解释。