This paper presents a study on the use of federated learning to train an ASR model based on a wav2vec 2.0 model pre-trained by self supervision. Carried out on the well-known TED-LIUM 3 dataset, our experiments show that such a model can obtain, with no use of a language model, a word error rate of 10.92% on the official TED-LIUM 3 test set, without sharing any data from the different users. We also analyse the ASR performance for speakers depending to their participation to the federated learning. Since federated learning was first introduced for privacy purposes, we also measure its ability to protect speaker identity. To do that, we exploit an approach to analyze information contained in exchanged models based on a neural network footprint on an indicator dataset. This analysis is made layer-wise and shows which layers in an exchanged wav2vec 2.0 based model bring the speaker identity information.
翻译:本文介绍了关于使用联合会式学习来培训ASR模型的研究,该模型以Wav2vec 2. 0模型为基础,经过自我监督预先培训。在著名的TED-LIUM 3数据集上,我们的实验表明,这种模型可以在不使用语言模型的情况下,在正式的TED-LIUM 3测试集上获得10.92%的字差率,而没有分享来自不同用户的任何数据。我们还根据参加联合学习的情况,对发言者的ASR性能进行分析。自从首次采用联合会式学习是为了隐私目的以来,我们还测量了它保护发言者身份的能力。为了做到这一点,我们利用一种方法,根据指标数据集的神经网络足迹分析交换模型中的信息。这一分析是分层进行的,并显示以 wav2vec 2.0为基础的交换式模型中的哪一层提供了发言者身份信息。