Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (12): 3477-3481.DOI: 10.11772/j.issn.1001-9081.2017.12.3477

Previous Articles     Next Articles

Improved clustering algorithm for multivariate time series with unequal length

HUO Weigang, CHENG Zhen, CHENG Wenli   

  1. School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
  • Received:2017-05-18 Revised:2017-07-05 Online:2017-12-10 Published:2017-12-18
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61301245), the Joint Funds of Civil Aviation Administration of China (U1633110).

面向不等长多维时间序列的聚类改进算法

霍纬纲, 程震, 程文莉   

  1. 中国民航大学 计算机科学与技术学院, 天津 300300
  • 通讯作者: 霍纬纲
  • 作者简介:霍纬纲(1978-),男,山西洪洞人,副教授,博士,CCF会员,主要研究方向:数据挖掘、模糊分类;程震(1991-),男,江苏沛县人,硕士研究生,主要研究方向:数据挖掘;程文莉(1992-),女,河南鹤壁人,硕士研究生,主要研究方向:大数据。
  • 基金资助:
    国家自然科学基金资助项目(61301245);中国民航联合研究基金资助项目(U1633110)。

Abstract: Aiming at the problem of slow speed of the existing model-based Multivariate Time Series (MTS) clustering algorithm when dealing with MTS wtih unequal length, an improved clustering algorithm named MUltivariate Time Series Clustering Algorithm based on Lift Ratio (LR) Component Extraction (MUTSCA〈LRCE〉) was proposed. Firstly, the equal frequency discretization method was used to symbolize MTS. Then, the LR vector was calculated to express the temporal pattern between the dimensions of time series of MTS samples. Each LR vector was sorted and a fixed number of different key components were extracted from both ends. All the extracted key components were spliced to form a model vector for representing the MTS samples. The MTS sample set with unequal length was transformed into a model vector set with equal length. Finally, the k-means algorithm was used for the clustering analysis of generated model vector set with equal length. The experimental results on multiple common data sets show that, compared with the model-based MTS clustering algorithm named MUltivariate Time Series Clustering Algorithm〈LR〉(MUTSCA〈LR〉), the proposed algorithm can significantly improve the clustering speed of MTS data sets with unequal length under the premise of guaranteeing clustering effect.

Key words: equal frequency discretization, k-means clustering, temporal pattern, Multivariate Time Series (MTS), efficiency

摘要: 针对已有基于模型的多维时间序列(MTS)聚类算法处理不等长MTS速度较慢的问题,提出了一种基于LR分量提取的MTS聚类算法(MUTSCA〈LRCE〉)。首先,采用等频离散化方法符号化MTS;然后,计算用于表达MTS样本各维时间序列之间时序模式的LR向量,对每个LR向量进行排序后从其两端提取固定数目的不同关键分量,所有提取的关键分量拼接形成表示MTS样本的模型向量,该过程将不等长MTS样本集转换为等长的模型向量集;最后,采用k-means算法对生成的等长模型向量集进行聚类分析。在多个公共数据集上的实验结果表明,与基于模型的MTS聚类算法——MUTSCA〈LR〉相比,所提算法能够在保证聚类效果的前提下,显著提高不等长MTS数据集的聚类速度。

关键词: 等频离散化, k-means聚类, 时序模式, 多维时间序列, 效率

CLC Number: