Natural gradient has been recently introduced to the field of boosting to enable the generic probabilistic predication capability. Natural gradient boosting shows promising performance improvements on small datasets due to better training dynamics, but it suffers from slow training speed overhead especially for large datasets. We present a replication study of NGBoost(Duan et al., 2019) training that carefully examines the impacts of key hyper-parameters under the circumstance of best-first decision tree learning. We find that with the regularization of leaf number clipping, the performance of NGBoost can be largely improved via a better choice of hyperparameters. Experiments show that our approach significantly beats the state-of-the-art performance on various kinds of datasets from the UCI Machine Learning Repository while still has up to 4.85x speed up compared with the original approach of NGBoost.
翻译:最近,自然梯度被引入了提振领域,以促成通用概率预测能力。自然梯度提振显示,由于培训动态的改善,小型数据集的性能有改善的希望,但由于培训速度缓慢,特别是大型数据集的培训速度管理。我们介绍了对NGBoost(Duan等人,2019年)的复制研究,该研究在最佳第一决策树学习的情况下仔细审查了关键超参数的影响。我们发现,随着叶数剪切的正规化,NGBoost的性能可以通过更好地选择超光度计而大大改善。实验表明,我们的方法大大超过了UCI机器学习存储器中各种数据集的最新性能,而与原NGBoost方法相比,仍然高达4.85x速度。