In the last fifteen years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif and discord discovery tools still require the user to provide the relative length. Yet, in several cases, the choice of length is critical and unforgiving. Unfortunately, the obvious brute-force solution, which tests all lengths within a given range, is computationally untenable. In this work, we introduce a new framework, which provides an exact and scalable motif and discord discovery algorithm that efficiently finds all motifs and discords in a given range of lengths. We evaluate our approach with five diverse real datasets, and demonstrate that it is up to 20 times faster than the state-of-the-art. Our results also show that removing the unrealistic assumption that the user knows the correct length, can often produce more intuitive and actionable results, which could have otherwise been missed. (Paper published in Data Mining and Knowledge Discovery Journal - 2020)
翻译:在过去十五年中,数据序列图案和不和发现已经成为数据序列挖掘的两个有用和使用良好的原始原始数据,在多个领域应用,包括机器人、昆虫学、地震学、医学和气候学。然而,最先进的图案和不和谐发现工具仍要求用户提供相对长度。然而,在若干情况下,长度的选择是关键和不可饶恕的。不幸的是,在一定范围内测试所有长度的显露的粗力解决方案在计算上是站不住脚的。在这项工作中,我们引入了一个新的框架,提供了精确和可缩放的移动和不和不和发现算法,在一定的长度中有效地发现所有图案和不和不和。我们用五种不同的真实数据集来评估我们的方法,并表明其速度比现状快20倍。我们的结果还表明,消除了用户知道正确长度的不现实假设,即用户往往能够产生更直观和可操作的结果,否则会被错过。 (Paperial minual and Knell Journal) (Pagenal and Knational-Docudial-discoverly pal-in Jours)