This paper presents a low-complexity framework for acoustic scene classification (ASC). Most of the frameworks designed for ASC use convolutional neural networks (CNNs) due to their learning ability and improved performance compared to hand-engineered features. However, CNNs are resource hungry due to their large size and high computational complexity. Therefore, CNNs are difficult to deploy on resource constrained devices. This paper addresses the problem of reducing the computational complexity and memory requirement in CNNs. We propose a low-complexity CNN architecture, and apply pruning and quantization to further reduce the parameters and memory. We then propose an ensemble framework that combines various low-complexity CNNs to improve the overall performance. An experimental evaluation of the proposed framework is performed on the publicly available DCASE 2022 Task 1 that focuses on ASC. The proposed ensemble framework has approximately 60K parameters, requires 19M multiply-accumulate operations and improves the performance by approximately 2-4 percentage points compared to the DCASE 2022 Task 1 baseline network.
翻译:本文介绍了声场场分类的低复杂度框架(ASC),ASC设计的大部分框架使用进化神经网络(CNNs),因为它们的学习能力和与手工设计功能相比性能提高。然而,CNN由于规模大和计算复杂程度高,资源缺乏,因此有线电视新闻网很难在资源受限装置上部署。本文讨论了降低CNN的计算复杂性和记忆要求的问题。我们提出了有线电视新闻网低复杂度结构,并运用分级和定量来进一步减少参数和记忆。我们随后提出了将各种低复杂度CNNs结合在一起的共合框架,以改善总体业绩。对拟议的框架的试验性评估是在公众可得到的DCASE 2022任务1中进行的,重点是ASC。拟议的组合框架约有60K参数,需要19M倍累积操作,并且比DCASE 2022任务1基线网络改进了大约2-4个百分点的性能。