基础模型时代音乐人工智能的主流研究领域 (Prevailing Research Areas for Music AI in the Era of Foundation Models)

Parallel to rapid advancements in foundation model research, the past few years have witnessed a surge in music AI applications. As AI-generated and AI-augmented music become increasingly mainstream, many researchers in the music AI community may wonder: what research frontiers remain unexplored? This paper outlines several key areas within music AI research that present significant opportunities for further investigation. We begin by examining foundational representation models and highlight emerging efforts toward explainability and interpretability. We then discuss the evolution toward multimodal systems, provide an overview of the current landscape of music datasets and their limitations, and address the growing importance of model efficiency in both training and deployment. Next, we explore applied directions, focusing first on generative models. We review recent systems, their computational constraints, and persistent challenges related to evaluation and controllability. We then examine extensions of these generative approaches to multimodal settings and their integration into artists' workflows, including applications in music editing, captioning, production, transcription, source separation, performance, discovery, and education. Finally, we explore copyright implications of generative music and propose strategies to safeguard artist rights. While not exhaustive, this survey aims to illuminate promising research directions enabled by recent developments in music foundation models.

翻译：随着基础模型研究的快速发展，过去几年见证了音乐人工智能应用的激增。在AI生成音乐和AI增强音乐日益成为主流的背景下，音乐人工智能领域的许多研究者可能会思考：哪些研究前沿尚未被探索？本文概述了音乐人工智能研究中几个具有重要探索机遇的关键领域。我们首先审视基础表征模型，并重点介绍了在可解释性与可理解性方面的新兴努力。接着，我们讨论了向多模态系统的演进，概述了当前音乐数据集的现状及其局限性，并探讨了模型在训练与部署中效率日益增长的重要性。随后，我们探索应用方向，首先聚焦于生成模型。我们回顾了近期系统、其计算约束，以及与评估和可控性相关的持续挑战。然后，我们考察了这些生成方法在多模态场景中的扩展及其与艺术家工作流程的整合，包括在音乐编辑、描述、制作、转录、源分离、演奏、发现及教育中的应用。最后，我们探讨了生成音乐的版权影响，并提出了保护艺术家权益的策略。尽管并非详尽无遗，本综述旨在阐明由近期音乐基础模型发展所催生的有前景的研究方向。