High quality datasets for learning-based modelling of polyphonic symbolic music remain less readily-accessible at scale than in other domains, such as language modelling or image classification. In particular, datasets which contain information revealing insights about human responses to the given music samples are rare. The issue of scale persists as a general hindrance towards breakthroughs in the field, while the lack of listener evaluation is especially relevant to the generative modelling problem-space, where clear objective metrics correlating strongly with qualitative success remain elusive. We propose the JS Fake Chorales, a dataset of 500 pieces generated by a new learning-based algorithm, provided in MIDI form. We take consecutive outputs from the algorithm and avoid cherry-picking in order to validate the potential to further scale this dataset on-demand. We conduct an online experiment for human evaluation, designed to be as fair to the listener as possible, and find that respondents were on average only 7\% better than random guessing at distinguishing JS Fake Chorales from real chorales composed by JS Bach. Furthermore, we make anonymised data collected from experiments available along with the MIDI samples, such as the respondents' musical experience and how long they took to submit their response for each sample. Finally, we conduct ablation studies to demonstrate the effectiveness of using the synthetic pieces for research in polyphonic music modelling, and find that we can improve on state-of-the-art validation set loss for the canonical JSB Chorales dataset, using a known algorithm, by simply augmenting the training set with the JS Fake Chorales.
翻译:与语言建模或图像分类等其他领域相比,学习多功能象征性音乐建模的高质量数据集在规模上仍然不那么容易获得。特别是,包含能揭示人类对特定音乐样本的反应的洞察力的数据集是罕见的。规模问题作为在实地突破方面普遍障碍依然存在,而缺乏听众评价对于基因化建模问题空间尤其相关,因为与质优成功密切相关的明确客观指标仍然难以找到。我们提议采用JS Fake Chorales,这是一个由基于学习的新算法产生的500个数据集,以语言建模或图像分类形式提供。我们从算法中获取连续产出,避免摘樱桃,以验证进一步根据需求扩大这一数据集的潜力。我们进行一个在线人类评价实验,目的是尽可能公平地对待听众,发现受访者在将JSUS Fake Chorales与JSBach的真正的查尔塔(JSFake Choralesyles)区分时,平均只有7个比随机猜测好。此外,我们从实验中收集了500个数据集,从实验中收集了500个零星数据,与MDIGLLEc-c-cal real realing exaling expeactal ex exmal exmactal ex exmissual exmal expeal exmusal exmusal ex,我们如何展示了我们如何展示了每个研究,我们如何在向实验,我们如何展示了C