梯度推动决策树的高级命令优化 (High-Order Optimization of Gradient Boosted Decision Trees)

Gradient Boosted Decision Trees (GBDTs) are dominant machine learning algorithms for modeling discrete or tabular data. Unlike neural networks with millions of trainable parameters, GBDTs optimize loss function in an additive manner and have a single trainable parameter per leaf, which makes it easy to apply high-order optimization of the loss function. In this paper, we introduce high-order optimization for GBDTs based on numerical optimization theory which allows us to construct trees based on high-order derivatives of a given loss function. In the experiments, we show that high-order optimization has faster per-iteration convergence that leads to reduced running time. Our solution can be easily parallelized and run on GPUs with little overhead on the code. Finally, we discuss future potential improvements such as automatic differentiation of arbitrary loss function and combination of GBDTs with neural networks.

翻译：QDTs 是模拟离散数据或表格数据的主要机器学习算法。与具有数百万可训练参数的神经网络不同,GBDTs以添加方式优化损失功能,每个叶叶都有单一的可训练参数,这便于对损失函数应用高端优化。在本文中,我们引入基于数字优化理论的GBDTs 高端优化,允许我们根据特定损失函数的高阶衍生物构建树木。在实验中,我们显示,高阶优化能更快的一线趋同,导致运行时间缩短。我们的解决方案可以很容易地平行并运行在GPUS上, 代码上几乎没有高管。最后,我们讨论了未来可能的改进,比如任意损失功能的自动区分以及GBDTs与神经网络的组合。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日