NIST CTS 议长承认挑战 (The NIST CTS Speaker Recognition Challenge)

The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.

翻译：自2020年8月以来,美国国家标准和技术研究所(NIST)一直在对CTS挑战进行第二次迭代。当前CTS挑战的迭代是使用从最不发达国家收集的Call My Net 2 (CMN2) 和多语言演讲(MLS) 未曝光部分中提取的电话数据,对CTS 挑战进行了领先板式的语音识别评价。CTS挑战目前的组织方式与SRE19 CTS挑战相似,仅提供了使用两个评估子集(即进步与测试)的开放式培训条件。与SRE19挑战不同,最初没有推出任何培训或开发组合,而NISTS挑战使用C挑战挑战2 (C) 尚未披露的部分电话数据来对CSTS 挑战评分进行评分。自2020年8月以来,共有53个组织(即进步或测试)在COralal Streal Rality Stal Rality Seral) 和CSLO(C) 的大规模数据评分数,在CS-CSeralalalal Stalal Studal 上,在C-de Stalalalalal del del 上使用了部分中,在C-dealalalalalalalalal delal delations a dal laveal laveal laveal lad laveal lax a a lad lads a lads a lad ladal ladal lads a ladal ladaldal ladaldal lad ladal lads a lads a lads a lads a lads a lads a ladal 这样的C a 这样的C a lads a ladal ladal lads a ladaldal ladaldaldaldaldal ladalalalalalalalalalalalalalal ladalalalal lads a ladal ladal lads a 。在C 1,在C 1,在C 1,在C 上,在C lad

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。