We propose and study a method for learning interpretable representations for the task of regression. Features are represented as networks of multi-type expression trees comprised of activation functions common in neural networks in addition to other elementary functions. Differentiable features are trained via gradient descent, and the performance of features in a linear model is used to weight the rate of change among subcomponents of each representation. The search process maintains an archive of representations with accuracy-complexity trade-offs to assist in generalization and interpretation. We compare several stochastic optimization approaches within this framework. We benchmark these variants on 100 open-source regression problems in comparison to state-of-the-art machine learning approaches. Our main finding is that this approach produces the highest average test scores across problems while producing representations that are orders of magnitude smaller than the next best performing method (gradient boosting). We also report a negative result in which attempts to directly optimize the disentanglement of the representation result in more highly correlated features.
翻译:我们建议并研究一种方法,用于学习回归任务的可解释的表达方式。特征代表着多类型表达式树的网络,包括神经网络中常见的激活功能和其他基本功能。不同的特征通过梯度下降来培训,线性模型中特征的性能被用来对每个代表的子组成部分的变化率进行加权。搜索过程维持一个包含准确性和复杂性权衡的表达方式档案,以协助概括化和解释。我们比较了这个框架内的几种随机优化方法。我们将这些变量以100个开放源回归问题作为基准,与最先进的机器学习方法相比。我们的主要发现是,这一方法生成了问题之间的最高平均测试分数,同时生成的表达方式的规模小于下一个最佳执行方法(梯度加速)的大小。我们还报告了一种负面的结果,即试图直接优化代表性的分解,导致更高度关联的特征。