In this paper we present an approach to determine the smallest possible number of neurons in a layer of a neural network in such a way that the topology of the input space can be learned sufficiently well. We introduce a general procedure based on persistent homology to investigate topological invariants of the manifold on which we suspect the data set. We specify the required dimensions precisely, assuming that there is a smooth manifold on or near which the data are located. Furthermore, we require that this space is connected and has a commutative group structure in the mathematical sense. These assumptions allow us to derive a decomposition of the underlying space whose topology is well known. We use the representatives of the $k$-dimensional homology groups from the persistence landscape to determine an integer dimension for this decomposition. This number is the dimension of the embedding that is capable of capturing the topology of the data manifold. We derive the theory and validate it experimentally on toy data sets.
翻译:在本文中,我们提出了一个方法,用以确定神经网络一层中最小的神经元数量,从而能够充分了解输入空间的地形学。我们采用了基于持久性同质学的一般性程序,以调查我们怀疑数据集所依赖的多元体的地形变量。我们精确地说明了所需的尺寸,假设数据所在的方位是平滑的,假设数据所在的方位或接近。此外,我们要求这一空间相互连接,并具有数学意义上的交流组结构。这些假设使我们能够从已知的表层层中得出一个深层空间的分解。我们利用持续地貌中的美元-维同质组的代表来确定这一分解的整数层面。这个数字是能够捕捉数据元的方位的嵌入的维度。我们从微小数据集中得出理论并实验验证它。