Symbolic Aggregate approximation (SAX) is a classical symbolic approach in many time series data mining applications. However, SAX only reflects the segment mean value feature and misses important information in a segment, namely the trend of the value change in the segment. Such a miss may cause a wrong classification in some cases, since the SAX representation cannot distinguish different time series with similar average values but different trends. In this paper, we present Trend Feature Symbolic Aggregate approximation (TFSAX) to solve this problem. First, we utilize Piecewise Aggregate Approximation (PAA) approach to reduce dimensionality and discretize the mean value of each segment by SAX. Second, extract trend feature in each segment by using trend distance factor and trend shape factor. Then, design multi-resolution symbolic mapping rules to discretize trend information into symbols. We also propose a modified distance measure by integrating the SAX distance with a weighted trend distance. We show that our distance measure has a tighter lower bound to the Euclidean distance than that of the original SAX. The experimental results on diverse time series data sets demonstrate that our proposed representation significantly outperforms the original SAX representation and an improved SAX representation for classification.
翻译:符号综合近似(SAX)是在许多时间序列数据开采应用中的一种典型象征性方法。然而,SAX只反映部分平均值特征,而忽略了某一部分的重要信息,即部分值变化的趋势。这种误差在某些情况下可能造成一种错误的分类,因为SAX表示无法以相似的平均值而以不同的趋势区分不同的时间序列。在本文中,我们介绍趋势特征符号综合近似(TFSAX),以解决这一问题。首先,我们使用Peafwith综合近似(PAAA)方法来减少SAX的维度,并分解每个部分的平均值。第二,利用趋势距离系数和趋势形状系数提取每个部分的趋势特征。然后,设计多分辨率符号绘制规则,将趋势信息分解成符号。我们还提出修改的距离测量,将SAX的距离与加权趋势距离结合起来。我们显示,我们的距离测量与Euclidean的距离比原SAX的距离更紧密。关于不同时间序列数据的实验结果表明,我们所提议的代表比例大大超过最初的SAX代表,并改进了SAX的分类。