Neural network-based methods have recently demonstrated state-of-the-art results on image synthesis and super-resolution tasks, in particular by using variants of generative adversarial networks (GANs) with supervised feature losses. Nevertheless, previous feature loss formulations rely on the availability of large auxiliary classifier networks, and labeled datasets that enable such classifiers to be trained. Furthermore, there has been comparatively little work to explore the applicability of GAN-based methods to domains other than images and video. In this work we explore a GAN-based method for audio processing, and develop a convolutional neural network architecture to perform audio super-resolution. In addition to several new architectural building blocks for audio processing, a key component of our approach is the use of an autoencoder-based loss that enables training in the GAN framework, with feature losses derived from unlabeled data. We explore the impact of our architectural choices, and demonstrate significant improvements over previous works in terms of both objective and perceptual quality.
翻译:以神经网络为基础的方法最近展示了图像合成和超分辨率任务方面的最先进的结果,特别是通过使用基因对抗网络(GANs)的变体,并监督特征损失,然而,以前的特征损失配方依赖大型辅助分类器网络的可用性,以及能够使这类分类器接受培训的标签数据集。此外,在探索基于GAN的方法适用于图像和视频以外的领域方面,相对而言,我们很少开展工作。在这项工作中,我们探索一种基于GAN的音频处理方法,并开发一种用于进行音频超分辨率的神经网络结构。除了若干新的音频处理建筑构件外,我们方法的一个关键组成部分是使用基于自动编码器的损失,以便能够进行GAN框架的培训,其中的特征损失来自未贴标签的数据。我们探索了我们的建筑选择的影响,并展示了客观和概念质量方面对以往工程的重大改进。