Discovering meaningful conceptual structures is a substantial task in data mining and knowledge discovery applications. While off-the-shelf interestingness indices defined in Formal Concept Analysis may provide an effective relevance evaluation in several situations, they frequently give inadequate results when faced with massive formal contexts (and concept lattices), and in the presence of irrelevant concepts. In this paper, we introduce the Conceptual Relevance (CR) score, a new scalable interestingness measurement for the identification of actionable concepts. From a conceptual perspective, the minimal generators provide key information about their associated concept intent. Furthermore, the relevant attributes of a concept are those that maintain the satisfaction of its closure condition. Thus, the guiding idea of CR exploits the fact that minimal generators and relevant attributes can be efficiently used to assess concept relevance. As such, the CR index quantifies both the amount of conceptually relevant attributes and the number of the minimal generators per concept intent. Our experiments on synthetic and real-world datasets show the efficiency of this measure over the well-known stability index.
翻译:在数据挖掘和知识发现应用中,发现有意义的概念结构是一项实质性任务,虽然正式概念分析中界定的现成有趣指数可能在若干情况下提供有效的相关性评价,但在面临大规模正式背景(和概念层)时,以及在存在不相关概念的情况下,这些指数往往没有产生充分的结果。在本文件中,我们引入概念相关性评分,这是用于确定可采取行动概念的一种新的可缩放性测量方法。从概念的角度来看,最小生成器提供了与其相关概念意图有关的关键信息。此外,一个概念的相关属性是那些保持其封闭状态满意度的参数。因此,公司责任分析的指导理念利用了以下事实,即最低限度的生成器和相关属性能够有效地用于评估概念相关性。因此,公司责任指数对概念相关属性的数量和最小生成器的每个概念意图的数量进行了量化。我们在合成和真实世界数据集方面的实验表明这一计量相对于众所周知的稳定指数的效率。