Machine learning (ML) has entered the mobile era where an enormous number of ML models are deployed on edge devices. However, running common ML models on edge devices continuously may generate excessive heat from the computation, forcing the device to "slow down" to prevent overheating, a phenomenon called thermal throttling. This paper studies the impact of thermal throttling on mobile phones: when it occurs, the CPU clock frequency is reduced, and the model inference latency may increase dramatically. This unpleasant inconsistent behavior has a substantial negative effect on user experience, but it has been overlooked for a long time. To counter thermal throttling, we propose to utilize dynamic networks with shared weights and dynamically shift between large and small ML models seamlessly according to their thermal profile, i.e., shifting to a small model when the system is about to throttle. With the proposed dynamic shifting, the application runs consistently without experiencing CPU clock frequency degradation and latency increase. In addition, we also study the resulting accuracy when dynamic shifting is deployed and show that our approach provides a reasonable trade-off between model latency and model accuracy.
翻译:机器学习( ML) 已经进入移动时代, 在边缘设备上部署了大量 ML 模型。 但是, 在边缘设备上持续运行常见 ML 模型可能会从计算中产生过热, 迫使设备“ 慢下来” 防止过热, 这是一种叫作热抽动的现象。 本文研究热抽动对移动电话的影响: 当热抽动发生时, CPU 时钟频率会降低, 模型推导延迟可能会急剧增加 。 这种不愉快的不一致行为对用户的体验有相当大的负面影响, 但长期以来一直被忽视 。 为了对抗热抽动, 我们提议使用具有共享重量的动态网络, 并且根据大小 ML 模型的热配置进行动态移动。 也就是说, 当系统即将发生电动时, 将转换为小型模型 。 随着拟议的动态变换, 应用程序会持续运行, 而不经历 CPUPU 时的频率退化和延缓度增加 。 此外, 我们还研究在动态变换时产生的准确性, 并显示我们的方法提供了在模型拉动和模型准确性之间进行合理的交换。