Although deep neural networks are successful for many tasks in the speech domain, the high computational and memory costs of deep neural networks make it difficult to directly deploy highperformance Neural Network systems on low-resource embedded devices. There are several mechanisms to reduce the size of the neural networks i.e. parameter pruning, parameter quantization, etc. This paper focuses on how to apply binary neural networks to the task of speaker verification. The proposed binarization of training parameters can largely maintain the performance while significantly reducing storage space requirements and computational costs. Experiment results show that, after binarizing the Convolutional Neural Network, the ResNet34-based network achieves an EER of around 5% on the Voxceleb1 testing dataset and even outperforms the traditional real number network on the text-dependent dataset: Xiaole while having a 32x memory saving.
翻译:虽然深神经网络在语音领域的许多任务中是成功的,但深神经网络的计算和记忆成本高昂,使得很难直接在低资源嵌入装置上部署高性能神经网络系统。有几种机制可以缩小神经网络的规模,即参数运行、参数量化等。本文侧重于如何应用二进制神经网络来完成扬声器校验任务。拟议培训参数的二进制可以在很大程度上保持性能,同时大大减少存储空间要求和计算成本。实验结果表明,在使进化神经网络实现二进制后,ResNet34网络在Voxceleb1测试数据集上实现了约5%的EER,甚至超越了基于文本的数据集上的传统实际数字网络: Xiaole,同时保存了32x记忆。