以两步多式深层清洗方式承认议长身份 (Speaker recognition with two-step multi-modal deep cleansing)

Neural network-based speaker recognition has achieved significant improvement in recent years. A robust speaker representation learns meaningful knowledge from both hard and easy samples in the training set to achieve good performance. However, noisy samples (i.e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation. In this paper, we propose a two-step audio-visual deep cleansing framework to eliminate the effect of noisy labels in speaker representation learning. This framework contains a coarse-grained cleansing step to search for the peculiar samples, followed by a fine-grained cleansing step to filter out the noisy labels. Our study starts from an efficient audio-visual speaker recognition system, which achieves a close to perfect equal-error-rate (EER) of 0.01\%, 0.07\% and 0.13\% on the Vox-O, E and H test sets. With the proposed multi-modal cleansing mechanism, four different speaker recognition networks achieve an average improvement of 5.9\%. Code has been made available at: \textcolor{magenta}{\url{https://github.com/TaoRuijie/AVCleanse}}.

翻译：近些年来,基于神经网络的语音识别工作取得了显著改善。一个强有力的发言者代表机构从培训中硬性和简便的样本中学习到有意义的知识,以取得良好的表现。然而,培训组的杂乱样本(即标签错误)引起混乱,并导致网络了解不正确的表述。在本文中,我们提议了一个两步声标的视听深度清洗框架,以消除音标学习中噪音标签的影响。这个框架包含一个粗糙的清洁步骤,以寻找特殊样本,随后是细微的清洗步骤,以过滤噪音标签。我们的研究始于一个高效的声标识别系统,该系统在Vox-O、E和H测试组上实现接近完美等速率的0.01、0.07 ⁇ 和0.13 ⁇ 。在拟议的多式清理机制下,四个不同的语音识别网络实现了5.9 ⁇ 的平均改进。代码公布在以下网址上:\ textcolora{mrogina-url{https://github.com/TAVAVIANS/AVALANS。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日