Domain mismatch is a noteworthy issue in acoustic event detection tasks, as the target domain data is difficult to access in most real applications. In this study, we propose a novel CNN-based discriminative training framework as a domain compensation method to handle this issue. It uses a parallel CNN-based discriminator to learn a pair of high-level intermediate acoustic representations. Together with a binary discriminative loss, the discriminators are forced to maximally exploit the discrimination of heterogeneous acoustic information in each audio clip with target events, which results in a robust paired representations that can well discriminate the target events and background/domain variations separately. Moreover, to better learn the transient characteristics of target events, a frame-wise classifier is designed to perform the final classification. In addition, a two-stage training with the CNN-based discriminator initialization is further proposed to enhance the system training. All experiments are performed on the DCASE 2018 Task3 datasets. Results show that our proposal significantly outperforms the official baseline on cross-domain conditions in AUC by relative $1.8-12.1$% without any performance degradation on in-domain evaluation conditions.
翻译:在声学事件探测任务中,域错配是一个值得注意的问题,因为目标域数据很难在最真实的应用中获取。在本研究中,我们提议建立一个新的CNN歧视培训框架,作为处理该问题的域补偿方法。它使用一个平行CNN歧视器,学习一对高级中间声学演示。除了双重歧视损失外,歧视者还被迫最大限度地利用每个音频剪辑中与目标事件混杂声学信息的歧视,从而形成一个强有力的对称,可以分别区分目标事件和背景/主要差异。此外,为了更好地了解目标活动的短暂特征,设计了一个框架性分类器,以进行最后分类。此外,还提议用CNNC歧视器初始化进行两阶段培训,以加强系统培训。所有实验都是在DCAS 2018任务3数据集上进行的。结果显示,我们的建议大大超出了奥地利联合公司关于交叉状况的官方基线,其比例为1.8-12.1 %,而没有在现场评估条件下出现任何业绩退化。