Few-shot learning systems for sound event recognition gain interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from long query sequence that contain not only the target events but also the other events and background noise. Therefore, it is required to prevent false positive reactions to both the other events and background noise. We propose metric learning with background noise class for the few-shot detection. The contribution is to present the explicit inclusion of background noise as a independent class, a suitable loss function that emphasizes this additional class, and a corresponding sampling strategy that assists training. It provides a feature space where the event classes and the background noise class are sufficiently separated. Evaluations on few-shot detection tasks, using DCASE 2017 task2 and ESC-50, show that our proposed method outperforms metric learning without considering the background noise class. The few-shot detection performance is also comparable to that of the DCASE 2017 task2 baseline system, which requires huge amount of annotated audio data.
翻译:用于正确事件识别的微小学习系统获得了兴趣,因为它们只需要几个例子就可以适应新的目标班级,而无需微调。然而,这些系统只应用到用于分类或核实的音响块中。在本文件中,我们的目标是通过长问号序列,从不仅包含目标事件,而且包含其他事件和背景噪音的长问号序列中,对稀有的音响事件进行少量探测。因此,需要防止对其他事件和背景噪音作出虚假的积极反应。我们建议用带有背景噪音类的量度学习来进行微小的探测。我们的贡献是将背景噪音明确列为独立的班级、强调这一额外班级的适当损失功能以及相应的协助培训的抽样战略。它提供了事件班和背景噪音类之间充分分离的特征空间。利用DCASE 2017任务2 和 ESC-50对几发音探测任务进行评估表明,我们拟议的方法在不考虑背景噪音类的情况下,超越了基准学习。微光探测性表现也类似于DCASE 2017基准系统,这需要大量的音频数据。