AM-MobileNet1D:一个可移植的演讲人认可模式 (AM-MobileNet1D: A Portable Model for Speaker Recognition)

Speaker Recognition and Speaker Identification are challenging tasks with essential applications such as automation, authentication, and security. Deep learning approaches like SincNet and AM-SincNet presented great results on these tasks. The promising performance took these models to real-world applications that becoming fundamentally end-user driven and mostly mobile. The mobile computation requires applications with reduced storage size, non-processing and memory intensive and efficient energy-consuming. The deep learning approaches, in contrast, usually are energy expensive, demanding storage, processing power, and memory. To address this demand, we propose a portable model called Additive Margin MobileNet1D (AM-MobileNet1D) to Speaker Identification on mobile devices. We evaluated the proposed approach on TIMIT and MIT datasets obtaining equivalent or better performances concerning the baseline methods. Additionally, the proposed model takes only 11.6 megabytes on disk storage against 91.2 from SincNet and AM-SincNet architectures, making the model seven times faster, with eight times fewer parameters.

翻译：SincNet和AM-SincNet等深思熟虑方法在这些任务上取得了巨大成果。这些有希望的绩效将这些模型带到了基本由终端用户驱动和多半流动的现实世界应用中。移动计算需要储量小、非处理和记忆密集且高效的能源消耗的应用。深思熟虑方法通常是能源昂贵、要求高的存储、处理力和记忆。为了满足这一需求,我们向移动设备问题发言人身份鉴定方案提议了一个名为Additive Margin MobilNet1D(AM-MobileNet1D)的便携式模型。我们评估了拟议的关于TIMIT和MIT数据集的方法,在基线方法方面获得同等或更好的绩效。此外,拟议的模型仅需要11.6兆字节的磁盘存储,而SincNet和AM-SincNet结构的存储为91.2兆字节,使模型更快7倍,参数减少8倍。

相关内容

声纹识别

关注 0

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

18+阅读 · 2020年2月26日

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日