The recent rise of privacy concerns has led researchers to devise methods for private neural inference -- where inferences are made directly on encrypted data, never seeing inputs. The primary challenge facing private inference is that computing on encrypted data levies an impractically-high latency penalty, stemming mostly from non-linear operators like ReLU. Enabling practical and private inference requires new optimization methods that minimize network ReLU counts while preserving accuracy. This paper proposes DeepReDuce: a set of optimizations for the judicious removal of ReLUs to reduce private inference latency. The key insight is that not all ReLUs contribute equally to accuracy. We leverage this insight to drop, or remove, ReLUs from classic networks to significantly reduce inference latency and maintain high accuracy. Given a target network, DeepReDuce outputs a Pareto frontier of networks that tradeoff the number of ReLUs and accuracy. Compared to the state-of-the-art for private inference DeepReDuce improves accuracy and reduces ReLU count by up to 3.5% (iso-ReLU count) and 3.5$\times$ (iso-accuracy), respectively.
翻译:最近隐私关注的上升导致研究人员为私人神经推断设计了方法 -- -- 在加密数据上直接作出推断,而从未看到投入。私人推断面临的主要挑战是,在加密数据上进行计算将处以不切实际的高悬浮罚款,主要来自ReLU等非线性操作者。 启用实用和私人推断需要新的优化方法,以尽量减少网络ReLU的计算,同时保持准确性。本文件建议深度减少:一套明智删除ReLU的优化,以减少私人推断的延迟性。关键见解是,并非所有 ReLU都同样有助于准确性。我们利用这一洞察力来降低或删除传统网络的ReLU,以大幅降低延迟性并保持高准确性。鉴于目标网络,EderReDREDE输出出一个交换ReLU数目和准确性的网络的Pareto边界。与私人引用的状态技术相比,可以提高准确性,将ReLU的计算率降低至3.5%(restime),(restime)和ReLU的计算,分别降低3.5-cretime-cal。