The extensive emergence of big data techniques has led to an increasing interest in the development of change-point detection algorithms that can perform well in a multivariate, possibly high-dimensional setting. In the current paper, we propose a new method for the consistent estimation of the number and location of multiple generalized change-points in multivariate, possibly high-dimensional, noisy data sequences. The number of change-points is allowed to increase with the sample size and the dimensionality of the given data sequence. Having a number of univariate signals, which constitute the unknown multivariate signal, our algorithm can deal with general structural changes; we focus on changes in the mean vector of a multivariate piecewise-constant signal, as well as changes in the linear trend of any of the univariate component signals. Our proposed algorithm, labeled Multivariate Isolate-Detect (MID), allows for consistent change-point detection in the presence of frequent changes of possibly small magnitudes in a computationally fast way.
翻译:大数据技术的广泛出现已导致人们对发展变化点检测算法的兴趣日益浓厚,这种算法可以在多变量、可能是高维环境下很好地发挥作用。在本文中,我们提出了一种新的方法,以一致估计多变量、可能是高维、振动的数据序列中多个通用变化点的数量和位置。随着数据序列的样本大小和维度,允许变化点的数量增加。由于存在一些构成未知多变量信号的单位信号,我们的算法可以处理一般的结构变化;我们侧重于多变量、片宽度信号的平均值矢量的变化,以及任何单方位元信号线性趋势的变化。我们提议的算法(标签为多变量、离子检测(MID))允许随着数据序列的样本大小和维度而增加。在计算速度中出现可能小大小的频繁变化时,我们的算法允许对变化点进行一致的探测。