Noise suppression models running in production environments are commonly trained on publicly available datasets. However, this approach leads to regressions due to the lack of training/testing on representative customer data. Moreover, due to privacy reasons, developers cannot listen to customer content. This `ears-off' situation motivates augmenting existing datasets in a privacy-preserving manner. In this paper, we present Aura, a solution to make existing noise suppression test sets more challenging and diverse while being sample efficient. Aura is `ears-off' because it relies on a feature extractor and a metric of speech quality, DNSMOS P.835, both pre-trained on data obtained from public sources. As an application of Aura, we augment the INTERSPEECH 2021 DNS challenge by sampling audio files from a new batch of data of 20K clean speech clips from Librivox mixed with noise clips obtained from AudioSet. Aura makes the existing benchmark test set harder by 0.27 in DNSMOS P.835 OVLR (7%), 0.64 harder in DNSMOS P.835 SIG (16%), increases diversity by 31%, and achieves a 26% improvement in Spearman's rank correlation coefficient (SRCC) compared to random sampling. Finally, we open-source Aura to stimulate research of test set development.
翻译:在生产环境中运行的降噪模型通常是在公共可用的数据集上进行训练的。然而,由于缺乏代表性的客户数据进行培训和测试,这种方法会导致退化。此外,由于隐私原因,开发人员不能听取客户内容。这种“无法听取”的情况激发了以保护隐私的方式增强现有数据集的需求。在本文中,我们提出了一种叫做Aura的解决方案,以使现有降噪测试集更具挑战性并增加多样性,同时具有样本效率。Aura是“无法听取”的,因为它依赖于特征提取器和语音质量度量DNSMOS P.835,这两者都是在公共来源的数据上进行预训练的。作为Aura的一个应用,我们使用从Librivox获取的20K个干净语音剪辑与从AudioSet获取的噪声剪辑混合来采样音频文件,从而增强INTERSPEECH 2021 DNS挑战赛的现有基准测试集。Aura使DNSMOS P.835 OVLR指标增加了0.27(7%),使DNSMOS P.835 SIG指标加了0.64(16%),增加了31%的多样性,并且Spearman等级相关系数(SRCC)比随机抽样提高了26% 。最后,我们开源Aura,以促进测试集开发的研究。