In this paper, we present a semi-supervised training technique using pseudo-labeling for end-to-end neural diarization (EEND). The EEND system has shown promising performance compared with traditional clustering-based methods, especially in the case of overlapping speech. However, to get a well-tuned model, EEND requires labeled data for all the joint speech activities of every speaker at each time frame in a recording. In this paper, we explore a pseudo-labeling approach that employs unlabeled data. First, we propose an iterative pseudo-label method for EEND, which trains the model using unlabeled data of a target condition. Then, we also propose a committee-based training method to improve the performance of EEND. To evaluate our proposed method, we conduct the experiments of model adaptation using labeled and unlabeled data. Experimental results on the CALLHOME dataset show that our proposed pseudo-label achieved a 37.4% relative diarization error rate reduction compared to a seed model. Moreover, we analyzed the results of semi-supervised adaptation with pseudo-labeling. We also show the effectiveness of our approach on the third DIHARD dataset.
翻译:在本文中,我们展示了一种半监督的培训技术,使用假标签进行终端到终端神经二极化(END)。EEND系统与传统的基于集群的方法相比,表现良好,特别是在发言重叠的情况下。然而,为了获得一个非常协调的模式,EEND要求为每个发言者在每一记录时间框架内的所有联合演讲活动提供标签数据。在本文中,我们探索一种使用无标签数据的伪标签方法。首先,我们为EEND提出一种迭代假标签方法,该方法利用一个目标条件的未标签数据来培训模型。然后,我们还提出了一种基于委员会的培训方法来改进EEND的绩效。为了评估我们提议的方法,我们使用标签和未标签的数据进行模型调整试验。ACTHOME数据集的实验结果表明,我们提议的伪标签比种子模型减少了37.4%的相对二极化误差率。此外,我们还分析了以伪标签方式对模型进行半监督的调整的结果。我们还展示了我们在第三个数据集上采用的方法的有效性。