This paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern - a concept on which most interestingness measures directly rely - with (2) SkOPUS: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art. This article was published in Data Mining and Knowledge Discovery and is accessible at http://dx.doi.org/10.1007/s10618-016-0467-9.
翻译:本文介绍了在“杠杆”下准确发现顶部顺序图案的框架。它将(1) 对预期对按顺序图案的支持的新定义――最令人感兴趣的措施直接依赖这个概念的概念――与(2) SkOPUS:在某种利益程度下准确发现顶部顺序图案的新的分支和约束算法。我们的有趣度度度度度测量采用了分割法。一种图案比假设它能够构成的两种图案之间的独立性来解释得更频繁,因此令人感兴趣。相对于独立下的预期而言,支助越多,这种模式就越有趣。我们利用这两个要素来精确地提取与预期支持定义一致的K级顺序图案。我们用已知的图案和真实世界数据集对合成数据进行实验;两个实验都证实了我们的方法在艺术状态方面的一致性和相关性。这篇文章在数据挖掘和知识发现中发表,可在http://dx.doi/10.1007/s10618-046-0467-9上查阅。