In this paper, we propose and develop the novel idea of treating musical sheets as literary documents in the traditional text analytics parlance, to fully benefit from the vast amount of research already existing in statistical text mining and topic modelling. We specifically introduce the idea of representing any given piece of music as a collection of "musical words" that we codenamed "muselets", which are essentially musical words of various lengths. Given the novelty and therefore the extremely difficulty of properly forming a complete version of a dictionary of muselets, the present paper focuses on a simpler albeit naive version of the ultimate dictionary, which we refer to as a Naive Dictionary because of the fact that all the words are of the same length. We specifically herein construct a naive dictionary featuring a corpus made up of African American, Chinese, Japanese and Arabic music, on which we perform both topic modelling and pattern recognition. Although some of the results based on the Naive Dictionary are reasonably good, we anticipate phenomenal predictive performances once we get around to actually building a full scale complete version of our intended dictionary of muselets.
翻译:在本文中,我们提出并发展了将音乐片作为传统文本分析语句中文学文献的新理念,以充分受益于统计文本挖掘和主题建模中已经存在的大量研究。我们特别提出将任何特定音乐片作为“音乐词”的集合,我们把“音乐词”编码为“音乐词”,这基本上是不同长度的音乐词句。鉴于新颖,因此很难适当地形成一个完整的缪斯字典,因此,本文件侧重于一个简单但天真的终词典版本,我们称之为神话词典,因为所有词词都是相同的。我们在此专门制作了一套天真字典,内容由非裔美国人、中国人、日本人和阿拉伯语音乐组成,我们同时进行专题建模和模式识别。尽管根据《神话词典》得出的一些结果相当好,但我们预计一旦我们开始实际建立我们预想的本词典的完整完整版本,就会有惊人的预测性能。