Phonesia VoxCeleb 承认2021年挑战的演讲人 (The Phonexia VoxCeleb Speaker Recognition Challenge 2021 System Description)

We describe the Phonexia submission for the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21) in the unsupervised speaker verification track. Our solution was very similar to IDLab's winning submission for VoxSRC-20. An embedding extractor was bootstrapped using momentum contrastive learning, with input augmentations as the only source of supervision. This was followed by several iterations of clustering to assign pseudo-speaker labels that were then used for supervised embedding extractor training. Finally, a score fusion was done, by averaging the zt-normalized cosine scores of five different embedding extractors. We briefly also describe unsuccessful solutions involving i-vectors instead of DNN embeddings and PLDA instead of cosine scoring.

翻译：我们描述了在无人监督的演讲者校验轨道上用于 VoxCeleb 2021 Vox-Celeb 演讲者承认挑战的Phonexia 呈件( VoxSRC-21) 。我们的解决方案与IDLab 赢得的 VoxSRC-20 演示非常相似。一个嵌入提取器是用动力对比学习来支撑的, 输入增强作为唯一的监督来源。之后又进行了几次集群循环, 以指定假口语标签, 然后用于监督嵌入提取器的培训。最后, 通过将五个不同的嵌入提取器的zt- 常规共弦分数平均, 完成了一个得分组合。我们还简要描述了涉及i- Vectors 而不是 DNN 嵌入和 PLDA 而不是 Cosine 评分的失败解决方案。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

【深度学习社区检测】Deep Learning for Community Detection: Progress, Challenges and Opportunities

专知会员服务

28+阅读 · 2020年6月13日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日