We show that the mutual information between the representation of a learning machine and the hidden features that it extracts from data is bounded from below by the relevance, which is the entropy of the model's energy distribution. Models with maximal relevance -- that we call Optimal Learning Machines (OLM) -- are hence expected to extract maximally informative representations. We explore this principle in a range of models. For fully connected Ising models and we show that {\em i)} OLM are characterised by inhomogeneous distributions of couplings, and that {\em ii)} their learning performance is affected by sub-extensive features that are elusive to a thermodynamic treatment. On specific learning tasks, we find that likelihood maximisation is achieved by models with maximal relevance. Training of Restricted Boltzmann Machines on the MNIST benchmark shows that learning is associated with a broadening of the spectrum of energy levels and that the internal representation of the hidden layer approaches the maximal relevance that can be achieved in a finite dataset. Finally, we discuss a Gaussian learning machine that clarifies that learning hidden features is conceptually different from parameter estimation.
翻译:我们显示,学习机器的表示和它从数据中提取的隐蔽特征之间的相互信息,来自以下的关联性,即模型能量分布的变异性,是模型能量分布的变异性。因此,最大相关性的模型 -- -- 我们称之为最佳学习机器(OLM) -- -- 可望在最大相关性的模型中产生最大程度的信息。我们在一系列模型中探索这一原则。对于完全连接的Ising模型,我们表明,OLM的特征是混合的不相容分布,而它们学习的性能受到热力处理难以找到的次延伸特性的影响。关于具体的学习任务,我们发现,最相关性的模型有可能实现最大化。在MNIST基准上对受限制的Boltzmann机器的培训表明,学习与扩大能源水平的范围有关,而隐藏层的内部代表接近在有限的数据集中可以达到的最大相关性。最后,我们讨论高斯学习机器,它澄清从概念上学习隐性参数是不同的参数。