High-utility sequential pattern mining (HUSPM) has recently emerged as a focus of intense research interest. The main task of HUSPM is to find all subsequences, within a quantitative sequential database, that have high utility with respect to a user-defined minimum utility threshold. However, it is difficult to specify the minimum utility threshold, especially when database features, which are invisible in most cases, are not understood. To handle this problem, top-k HUSPM was proposed. Up to now, only very preliminary work has been conducted to capture top-k HUSPs, and existing strategies require improvement in terms of running time, memory consumption, unpromising candidate filtering, and scalability. Moreover, no systematic problem statement has been defined. In this paper, we formulate the problem of top-k HUSPM and propose a novel algorithm called TKUS. To improve efficiency, TKUS adopts a projection and local search mechanism and employs several schemes, including the Sequence Utility Raising, Terminate Descendants Early, and Eliminate Unpromising Items strategies, which allow it to greatly reduce the search space. Finally, experimental results demonstrate that TKUS can achieve sufficiently good top-k HUSPM performance compared to state-of-the-art algorithm TKHUS-Span.
翻译:高功率连续型采矿(HUSPM)最近成为研究兴趣浓厚的一个焦点,HUSPM的主要任务是在数量顺序数据库中找到所有对用户定义的最低使用门槛具有高度效用的子序列,然而,很难确定最低限度的通用阈值,特别是当大多数情况下看不到的数据库特征不为人们所理解时。为了处理这一问题,提出了最高K型HUSPM(HUSPM)建议。到目前为止,只开展了非常初步的工作来捕捉顶级HUSP,而现行战略需要改进运行时间、记忆消耗、不鼓励候选人过滤和可缩放等各方面的子序列。此外,没有确定系统性的问题说明。在本文件中,我们提出顶级HUSPM问题并提出称为TKUS的新算法。为了提高效率,TKUS采用一个预测和本地搜索机制,并采用若干办法,包括 " 后继效用提升 " 、早期消灭后期 " 和消除不具有规划性的项目战略,从而大大缩短了搜索空间、记忆消耗、不鼓励候选人过滤和可缩放。最后,我们没有界定系统的问题说明。在本文件中提出系统的问题说明。我们提出了最高级HUS最高演算法能够取得良好成绩。