Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets $f$ that are $k$-juntas, they showed that these heuristics successfully learn $f$ with depth-$k$ decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth-$k$ decision trees. We provide a counterexample to this conjecture: we construct targets that are depth-$k$ decision trees and show that even in the smoothed setting, these heuristics build trees of depth $2^{\Omega(k)}$ before achieving high accuracy. We also show that the guarantees of Brutzkus et al. cannot extend to the agnostic setting: there are targets that are very close to $k$-juntas, for which these heuristics build trees of depth $2^{\Omega(k)}$ before achieving high accuracy.
翻译:贪婪决策树的学习杂务是机器学习实践的支柱,但经验成功的理论理由仍然难以找到。事实上,人们早已知道,有些简单的目标功能是他们严重失败的(Kearns和Mansour,STOC,1996年)。Brutzkus、Daniely和Malach(COLT 2020年)最近的工作认为,平滑的分析模型是解决这一脱节的可能途径。在平滑的设置和以美元计价的美元中,它们表明这些超自然界成功地用深度-美元决定树假设学习了美元。它们预测,对于深度-k$决策树的目标,同样的保证将更为普遍地保留在深度-k美元决定树上。我们对这一推论提供了反比:我们构建了深度-k美元决策树的目标,并表明即使在平滑的环境下,这些超自然界在达到高深度之前就建造了2美元深的树。我们还表明,布鲁茨库和他人的保证不能延伸到高水平的树。