Recent advances in AI-based music generation have focused heavily on text-conditioned models, with less attention given to reference-based generation such as song adaptation. To support this line of research, we introduce LargeSHS, a large-scale dataset derived from SecondHandSongs, containing over 1.7 million metadata entries and approximately 900k publicly accessible audio links. Unlike existing datasets, LargeSHS includes structured adaptation relationships between musical works, enabling the construction of adaptation trees and performance clusters that represent cover song families. We provide comprehensive statistics and comparisons with existing datasets, highlighting the unique scale and richness of LargeSHS. This dataset paves the way for new research in cover song generation, reference-based music generation, and adaptation-aware MIR tasks.
翻译:近期基于人工智能的音乐生成研究主要集中于文本条件模型,而对基于参考的生成(如歌曲改编)关注较少。为支持这一研究方向,我们引入了LargeSHS——一个源自SecondHandSongs的大规模数据集,包含超过170万条元数据条目及约90万个可公开访问的音频链接。与现有数据集不同,LargeSHS包含音乐作品间的结构化改编关系,能够构建代表翻唱歌曲家族的改编树和表演聚类。我们提供了全面的统计数据及与现有数据集的对比,突显了LargeSHS在规模和丰富性上的独特优势。该数据集为翻唱歌曲生成、基于参考的音乐生成及改编感知的音乐信息检索任务开辟了新的研究路径。