提升神经网络表达能力：单纯形的细分方法 (Better Neural Network Expressivity: Subdividing the Simplex)

This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21 / SIDMA'23) conjectured that this result is optimal in the sense that there are CPWL functions on $\mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $\lceil\log_3(n-1)\rceil+1$ hidden layers are sufficient to compute all CPWL functions on $\mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $\lceil\log_3(n-2)\rceil+1$ hidden layers are sufficient to compute the maximum of $n\geq 4$ numbers. Our constructions almost match the $\lceil\log_3(n)\rceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks with weights that are decimal fractions. The constructions have a geometric interpretation via polyhedral subdivisions of the simplex into ``easier'' polytopes.

翻译：本研究探讨了ReLU神经网络的表达能力，重点关注其深度。先前一系列研究表明，$\lceil \log_2(n+1) \rceil$个隐藏层足以计算$\mathbb{R}^n$上所有连续分段线性（CPWL）函数。Hertrich、Basu、Di Summa和Skutella（NeurIPS'21 / SIDMA'23）猜想该结果是最优的，即存在$\mathbb{R}^n$上的CPWL函数（如最大值函数）需要达到该深度。我们否定了这一猜想，并证明$\lceil\log_3(n-1)\rceil+1$个隐藏层足以计算$\mathbb{R}^n$上所有CPWL函数。证明的关键步骤在于：具有两个隐藏层的ReLU神经网络可以精确表示五个输入的最大值函数。更一般地，我们证明$\lceil\log_3(n-2)\rceil+1$个隐藏层足以计算$n\geq 4$个数字的最大值。我们的构造方法在特殊情况下（即权重为十进制分数的ReLU网络）几乎达到了Averkov、Hojny和Merkert（ICLR'25）提出的$\lceil\log_3(n)\rceil$下界。该构造可通过将单纯形细分为“更简单”多面体的多面体剖分进行几何解释。