Eavesdropping from the user's smartphone is a well-known threat to the user's safety and privacy. Existing studies show that loudspeaker reverberation can inject speech into motion sensor readings, leading to speech eavesdropping. While more devastating attacks on ear speakers, which produce much smaller scale vibrations, were believed impossible to eavesdrop with zero-permission motion sensors. In this work, we revisit this important line of reach. We explore recent trends in smartphone manufacturers that include extra/powerful speakers in place of small ear speakers, and demonstrate the feasibility of using motion sensors to capture such tiny speech vibrations. We investigate the impacts of these new ear speakers on built-in motion sensors and examine the potential to elicit private speech information from the minute vibrations. Our designed system EarSpy can successfully detect word regions, time, and frequency domain features and generate a spectrogram for each word region. We train and test the extracted data using classical machine learning algorithms and convolutional neural networks. We found up to 98.66% accuracy in gender detection, 92.6% detection in speaker detection, and 56.42% detection in digit detection (which is 5X more significant than the random selection (10%)). Our result unveils the potential threat of eavesdropping on phone conversations from ear speakers using motion sensors.
翻译:窃听用户智能手机的窃听器对用户的安全和隐私构成了众所周知的威胁。 现有研究表明,扩音器的回响能将语音注入运动传感器读数,导致语音窃听。 虽然对耳语的打击更具破坏性,产生规模小得多的震动,但据信无法用零许可运动传感器窃听。 在这项工作中,我们重新审视这一重要的接触线。 我们探索智能手机制造商的最新趋势,包括使用超/强扬声器代替小耳语器,并展示使用运动传感器捕捉这种微小语音振动的可行性。 我们调查这些新耳语器对内动传感器的影响,并研究从微声振动中获取私人语音信息的可能性。 我们设计的系统Earspy能够成功地探测到文字区域、时间和频率域特征,并为每个字区生成一个光谱。 我们用古典机扩音器学习算法和动态神经网络来培训和测试所提取的数据。 我们发现,在性别探测中达到98.66%的准确度,在内部语音传感器中,92.6 %的检测和5 %的语音探测结果中,在通过甚高的语音探测和图像中(10.42%的检测,比在升级的语音探测和图像中发现有5号)。