The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (ICs) to intermediate layers of a neural network. ICs can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. However, if a particular IC does not decide to return an answer early, its predictions are discarded, with its computations effectively being wasted. To solve this issue, we introduce Zero Time Waste (ZTW), a novel approach in which each IC reuses predictions returned by its predecessors by (1) adding direct connections between ICs and (2) combining previous outputs in an ensemble-like manner. We conduct extensive experiments across various datasets and architectures to demonstrate that ZTW achieves a significantly better accuracy vs. inference time trade-off than other recently proposed early exit methods.
翻译:减少大型深层学习模型的处理时间是许多现实应用中的一项根本挑战。早期退出方法通过将更多的内部分类器(ICs)附加到神经网络的中间层来实现这一目标。ICs可以迅速将预测反馈到简单的例子中,从而减少整个模型的平均推论时间。但是,如果某个IC公司不决定尽早回复答案,则其预测被抛弃,其计算被有效浪费。为了解决这个问题,我们引入了零时间废物(ZTW),这是一种新颖的方法,即每个IC公司重新使用其前身的预测,其前身的预测是:(1) 增加ICs之间的直接联系,(2) 以合用合用的方式合并以前的产出。我们通过各种数据集和结构进行广泛的实验,以证明ZTW比最近提出的早期退出方法的推论得更准确得多。