We address the challenging open problem of learning an effective latent space for symbolic music data in generative music modeling. We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information concerning music genre and style. Through the paper, we show how Gaussian mixtures taking into account music metadata information can be used as an effective prior for the autoencoder latent space, introducing the first Music Adversarial Autoencoder (MusAE). The empirical analysis on a large scale benchmark shows that our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders. It is also able to create realistic interpolations between two musical sequences, smoothly changing the dynamics of the different tracks. Experiments show that the model can organise its latent space accordingly to low-level properties of the musical pieces, as well as to embed into the latent variables the high-level genre information injected from the prior distribution to increase its overall performance. This allows us to perform changes to the generated pieces in a principled way.
翻译:我们解决了在基因音乐模型中学习象征性音乐数据的有效潜在空间这一具有挑战性的开放问题。我们侧重于利用对抗性规范作为灵活和自然的手段,让与音乐类型和风格相关的背景信息使变异自动调试器具有灵活性和天然性能。我们通过论文展示了高斯混合物如何将音乐元数据信息作为自动读数潜在空间的有效使用,引入了首个音乐反动自动调解调器(MusAE)。大规模基准实验分析表明,我们的模型的重建精度高于基于标准变异自动演算器的最先进的模型。它还能够在两个音乐序列之间创造现实的插图,平稳地改变不同轨道的动态。实验表明,模型可以据此将其潜在空间组织成低水平的音乐片属性,并将从先前发行中注入的高层次基因信息嵌入潜在变量,以提高其总体性能。这使我们能够以有原则的方式对生成的作品进行修改。