JSJS Fake Chorales:由人类记号组成的多声音乐合成数据集 (JS Fake Chorales: a Synthetic Dataset of Polyphonic Music with Human Annotation)

High quality datasets for learning-based modelling of polyphonic symbolic music remain less readily-accessible at scale than in other domains, such as language modelling or image classification. In particular, datasets which contain information revealing insights about human responses to the given music samples are rare. The issue of scale persists as a general hindrance towards breakthroughs in the field, while the lack of listener evaluation is especially relevant to the generative modelling problem-space, where clear objective metrics correlating strongly with qualitative success remain elusive. We propose the JS Fake Chorales, a dataset of 500 pieces generated by a new learning-based algorithm, provided in MIDI form. We take consecutive outputs from the algorithm and avoid cherry-picking in order to validate the potential to further scale this dataset on-demand. We conduct an online experiment for human evaluation, designed to be as fair to the listener as possible, and find that respondents were on average only 7\% better than random guessing at distinguishing JS Fake Chorales from real chorales composed by JS Bach. Furthermore, we make anonymised data collected from experiments available along with the MIDI samples, such as the respondents' musical experience and how long they took to submit their response for each sample. Finally, we conduct ablation studies to demonstrate the effectiveness of using the synthetic pieces for research in polyphonic music modelling, and find that we can improve on state-of-the-art validation set loss for the canonical JSB Chorales dataset, using a known algorithm, by simply augmenting the training set with the JS Fake Chorales.

翻译：与语言建模或图像分类等其他领域相比,学习多功能象征性音乐建模的高质量数据集在规模上仍然不那么容易获得。特别是,包含能揭示人类对特定音乐样本的反应的洞察力的数据集是罕见的。规模问题作为在实地突破方面普遍障碍依然存在,而缺乏听众评价对于基因化建模问题空间尤其相关,因为与质优成功密切相关的明确客观指标仍然难以找到。我们提议采用JS Fake Chorales,这是一个由基于学习的新算法产生的500个数据集,以语言建模或图像分类形式提供。我们从算法中获取连续产出,避免摘樱桃,以验证进一步根据需求扩大这一数据集的潜力。我们进行一个在线人类评价实验,目的是尽可能公平地对待听众,发现受访者在将JSUS Fake Chorales与JSBach的真正的查尔塔(JSFake Choralesyles)区分时,平均只有7个比随机猜测好。此外,我们从实验中收集了500个数据集,从实验中收集了500个零星数据,与MDIGLLEc-c-cal real realing exaling expeactal ex exmal exmactal ex exmissual exmal expeal exmusal exmusal ex,我们如何展示了我们如何展示了每个研究,我们如何在向实验,我们如何展示了C

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《元宇宙Metaverse》报告，53页ppt，中美科技巨头押注

专知会员服务

98+阅读 · 2021年8月16日

边缘机器学习，21页ppt

专知会员服务

84+阅读 · 2021年6月21日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日