Recently there has been an increase in the studies on time-series data mining specifically time-series clustering due to the vast existence of time-series in various domains. The large volume of data in the form of time-series makes it necessary to employ various techniques such as clustering to understand the data and to extract information and hidden patterns. In the field of clustering specifically, time-series clustering, the most important aspects are the similarity measure used and the algorithm employed to conduct the clustering. In this paper, a new similarity measure for time-series clustering is developed based on a combination of a simple representation of time-series, slope of each segment of time-series, Euclidean distance and the so-called dynamic time warping. It is proved in this paper that the proposed distance measure is metric and thus indexing can be applied. For the task of clustering, the Particle Swarm Optimization algorithm is employed. The proposed similarity measure is compared to three existing measures in terms of various criteria used for the evaluation of clustering algorithms. The results indicate that the proposed similarity measure outperforms the rest in almost every dataset used in this paper.
翻译:最近,关于时间序列数据挖掘的研究有所增加,具体地说,由于在不同领域存在大量时间序列,时间序列集群的研究有所增加;由于时间序列形式的数据数量巨大,有必要采用各种技术,例如集群来理解数据并提取信息和隐藏模式;具体地说,时间序列集群领域,最重要的方面是所采用的相似性计量和进行集群所使用的算法;在本文件中,时间序列集群的新相似性计量是结合一个简单的时间序列、时间序列每个部分的斜度、欧几里德距离和所谓的动态时间扭曲的组合来制定的;本文证明,拟议的距离计量是衡量的尺度,因此可以采用索引法;关于集群的任务,采用了Particle Swarm Opitimization算法;将拟议的类似度计量法与评估组合算法所用各种标准方面的三项现有计量法相比较;结果显示,拟议的类似度量数超出了本文件所用几乎每个数据集的休息。