STC 远程挑战之声议长承认系统 (STC Speaker Recognition Systems for the VOiCES From a Distance Challenge)

This paper presents the Speech Technology Center (STC) speaker recognition (SR) systems submitted to the VOiCES From a Distance challenge 2019. The challenge's SR task is focused on the problem of speaker recognition in single channel distant/far-field audio under noisy conditions. In this work we investigate different deep neural networks architectures for speaker embedding extraction to solve the task. We show that deep networks with residual frame level connections outperform more shallow architectures. Simple energy based speech activity detector (SAD) and automatic speech recognition (ASR) based SAD are investigated in this work. We also address the problem of data preparation for robust embedding extractors training. The reverberation for the data augmentation was performed using automatic room impulse response generator. In our systems we used discriminatively trained cosine similarity metric learning model as embedding backend. Scores normalization procedure was applied for each individual subsystem we used. Our final submitted systems were based on the fusion of different subsystems. The results obtained on the VOiCES development and evaluation sets demonstrate effectiveness and robustness of the proposed systems when dealing with distant/far-field audio under noisy conditions.

翻译：本文介绍了提交给2019年远程挑战的语音技术中心(STC)语音识别系统。挑战的SR任务侧重于在噪音条件下在单频道远程/远场音频中以单频道远程/远场音频识别语音的问题。在这项工作中,我们调查了不同深神经网络结构,用于让发言者嵌入提取,以解决这个问题。我们显示,具有剩余框架连接的深网络比浅层结构更优于浅层结构。在这项工作中调查了基于简单能源的语音活动探测器(SAD)和基于SAD的自动语音识别(ASR)的简易语音识别系统。我们还解决了为强大的嵌入提取器培训而准备数据的问题。数据增强数据的重新校正是使用自动室脉冲响应生成器进行的。在我们的系统中,我们使用了经过有区别的训练的共性相似模型学习模型,作为嵌入后端。我们使用的每个单个子系统都采用了分数正常化程序。我们最后提交的系统是以不同子系统的聚合为基础。在VoICS开发和评价中取得的结果表明,拟议的系统在与远地/远方声音打交道时的有效性和稳健。

相关内容

声纹识别

关注 443

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

16+阅读 · 2020年5月6日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

16+阅读 · 2020年3月23日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

77+阅读 · 2020年2月25日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

25+阅读 · 2020年2月16日