Speaker Recognition and Speaker Identification are challenging tasks with essential applications such as automation, authentication, and security. Deep learning approaches like SincNet and AM-SincNet presented great results on these tasks. The promising performance took these models to real-world applications that becoming fundamentally end-user driven and mostly mobile. The mobile computation requires applications with reduced storage size, non-processing and memory intensive and efficient energy-consuming. The deep learning approaches, in contrast, usually are energy expensive, demanding storage, processing power, and memory. To address this demand, we propose a portable model called Additive Margin MobileNet1D (AM-MobileNet1D) to Speaker Identification on mobile devices. We evaluated the proposed approach on TIMIT and MIT datasets obtaining equivalent or better performances concerning the baseline methods. Additionally, the proposed model takes only 11.6 megabytes on disk storage against 91.2 from SincNet and AM-SincNet architectures, making the model seven times faster, with eight times fewer parameters.
翻译:SincNet和AM-SincNet等深思熟虑方法在这些任务上取得了巨大成果。这些有希望的绩效将这些模型带到了基本由终端用户驱动和多半流动的现实世界应用中。移动计算需要储量小、非处理和记忆密集且高效的能源消耗的应用。深思熟虑方法通常是能源昂贵、要求高的存储、处理力和记忆。为了满足这一需求,我们向移动设备问题发言人身份鉴定方案提议了一个名为Additive Margin MobilNet1D(AM-MobileNet1D)的便携式模型。我们评估了拟议的关于TIMIT和MIT数据集的方法,在基线方法方面获得同等或更好的绩效。此外,拟议的模型仅需要11.6兆字节的磁盘存储,而SincNet和AM-SincNet结构的存储为91.2兆字节,使模型更快7倍,参数减少8倍。