With the rapid development of deep learning techniques, the popularity of voice services implemented on various Internet of Things (IoT) devices is ever increasing. In this paper, we examine user-level membership inference in the problem space of voice services, by designing an audio auditor to verify whether a specific user had unwillingly contributed audio used to train an automatic speech recognition (ASR) model under strict black-box access. With user representation of the input audio data and their corresponding translated text, our trained auditor is effective in user-level audit. We also observe that the auditor trained on specific data can be generalized well regardless of the ASR model architecture. We validate the auditor on ASR models trained with LSTM, RNNs, and GRU algorithms on two state-of-the-art pipelines, the hybrid ASR system and the end-to-end ASR system. Finally, we conduct a real-world trial of our auditor on iPhone Siri, achieving an overall accuracy exceeding 80\%. We hope the methodology developed in this paper and findings can inform privacy advocates to overhaul IoT privacy.
翻译:随着深层学习技术的迅速发展,在各种物联网(IoT)设备上实施的语音服务越来越受欢迎。在本文件中,我们通过设计一个音频审计员,核实某一用户是否不愿意提供音频,用于在严格的黑箱接入下培训自动语音识别模式;随着输入的音频数据及其相应译文的用户的展示,我们受过培训的审计员在用户一级的审计中是有效的。我们还注意到,无论ASR模式结构如何,受过具体数据培训的审计员都可以被广泛使用。我们验证了接受LSTM、RNNs和GRU两种最先进的管道、混合ASR系统和终端到终端ASR系统培训的ASR模型。最后,我们对我们的审计员进行了iPhone Siri进行真实的试验,其总体准确性超过80 ⁇ 。我们希望,本文中制定的方法和调查结果能够使隐私倡导者了解对IOT隐私的修改。