Spreading processes on graphs arise in a host of application domains, from the study of online social networks to viral marketing to epidemiology. Various discrete-time probabilistic models for spreading processes have been proposed. These are used for downstream statistical estimation and prediction problems, often involving messages or other information that is transmitted along with infections caused by the process. It is thus important to design models of cascade observation that take into account phenomena that lead to uncertainty about the process state at any given time. We highlight one such phenomenon -- temporal distortion -- caused by a misalignment between the rate at which observations of a cascade process are made and the rate at which the process itself operates, and argue that failure to correct for it results in degradation of performance on downstream statistical tasks. To address these issues, we formulate the clock estimation problem in terms of a natural distortion measure. We give a clock estimation algorithm, which we call FastClock, that runs in linear time in the size of its input and is provably statistically accurate for a broad range of model parameters when cascades are generated from the independent cascade process with known parameters and when the underlying graph is Erd\H{o}s-R\'enyi. We further give empirical results on the performance of our algorithm in comparison to the state of the art estimator, a likelihood proxy maximization-based estimator implemented via dynamic programming. We find that, in a broad parameter regime, our algorithm substantially outperforms the dynamic programming algorithm in terms of both running time and accuracy.
翻译:从在线社交网络研究到病毒营销到流行病学等一系列应用领域都会出现图上传播过程。提出了各种分散的传播过程的时间概率模型。这些模型用于下游统计估计和预测问题,往往涉及信息或其他信息,与该过程造成的感染一起传播。因此,必须设计考虑到在任何特定时间导致进程状态不确定性的现象的级联观测模型。我们强调一种此类现象 -- -- 时间扭曲 -- -- 其原因是对级联进程的观测速度与流程本身的运作速度之间的不匹配,并论证未能纠正,导致下游统计任务业绩的退化。为了解决这些问题,我们用自然扭曲计量来制定时钟估算问题。我们称之为“快钟”的时钟估算算法,这种算法在任何特定时间里都考虑到导致进程状态不确定性。当级联级联与已知参数产生进一步对比时,以及当基础图表是Erd\Hoximeximisalation的运行时间,我们用一个基于广泛动态的轨迹谱,我们用最短的轨算法,我们用最短的轨算算法来找到我们最短的动态的进度。