We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network. Such data are important, e.g., when studying click streams in information networks, travel patterns in transportation systems, information cascades in social networks, biological pathways or time-stamped social interactions. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: when is a network abstraction of sequential data justified? Addressing this open question, we propose a framework which combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms previously used Markov order detection techniques. An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models which capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.
翻译:我们引入了用于构建测算网络不同长度的相继数据路径的模型框架。 这些数据很重要, 例如, 在研究信息网络中的点击流、 运输系统中的旅行模式、 社会网络中的信息级联、 生物路径或时间标记的社会互动时, 我们引入了一个用于构建测图分析和网络分析的模型框架。 虽然对这些数据应用图解分析和网络分析是常见的, 但最近的工作表明, 时间相关性可以使这些方法的结果失效。 这就提出了一个基本问题: 当网络抽取序列数据的网络何时合理? 解决这个开放的问题, 我们提出一个框架, 将多层次、 更高订单的Markov连锁结合到一个多层次的图形模型模型中, 同时在多长度的尺度中捕捉时间相关关系。 我们开发了一个模型选择技术模型, 来推断这种模型的最佳层数, 并显示它比先前使用的Markov 命令检测技术更完美。 对路径和时间网络的八个实际数据集的应用表明, 它可以推断能够捕捉到这些数据的表层和时间特性的图形模型模型。 我们的工作突出网络抽象模型的偏差, 当这些模型可以打开网络的图象学角度时, 。