Improved clustering algorithm for multivariate time series with unequal length

doi:10.11772/j.issn.1001-9081.2017.12.3477

Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (12): 3477-3481.DOI: 10.11772/j.issn.1001-9081.2017.12.3477

Previous Articles Next Articles

Improved clustering algorithm for multivariate time series with unequal length

HUO Weigang, CHENG Zhen, CHENG Wenli

School of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China

Received:2017-05-18 Revised:2017-07-05 Online:2017-12-18 Published:2017-12-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61301245), the Joint Funds of Civil Aviation Administration of China (U1633110).

面向不等长多维时间序列的聚类改进算法

霍纬纲, 程震, 程文莉

中国民航大学计算机科学与技术学院, 天津 300300

通讯作者: 霍纬纲
作者简介:霍纬纲(1978-),男,山西洪洞人,副教授,博士,CCF会员,主要研究方向:数据挖掘、模糊分类;程震(1991-),男,江苏沛县人,硕士研究生,主要研究方向:数据挖掘;程文莉(1992-),女,河南鹤壁人,硕士研究生,主要研究方向:大数据。
基金资助:
国家自然科学基金资助项目（61301245）；中国民航联合研究基金资助项目（U1633110）。

Abstract

Abstract: Aiming at the problem of slow speed of the existing model-based Multivariate Time Series (MTS) clustering algorithm when dealing with MTS wtih unequal length, an improved clustering algorithm named MUltivariate Time Series Clustering Algorithm based on Lift Ratio (LR) Component Extraction (MUTSCA〈LRCE〉) was proposed. Firstly, the equal frequency discretization method was used to symbolize MTS. Then, the LR vector was calculated to express the temporal pattern between the dimensions of time series of MTS samples. Each LR vector was sorted and a fixed number of different key components were extracted from both ends. All the extracted key components were spliced to form a model vector for representing the MTS samples. The MTS sample set with unequal length was transformed into a model vector set with equal length. Finally, the k-means algorithm was used for the clustering analysis of generated model vector set with equal length. The experimental results on multiple common data sets show that, compared with the model-based MTS clustering algorithm named MUltivariate Time Series Clustering Algorithm〈LR〉(MUTSCA〈LR〉), the proposed algorithm can significantly improve the clustering speed of MTS data sets with unequal length under the premise of guaranteeing clustering effect.

Key words: equal frequency discretization, k-means clustering, temporal pattern, Multivariate Time Series (MTS), efficiency

摘要： 针对已有基于模型的多维时间序列（MTS）聚类算法处理不等长MTS速度较慢的问题，提出了一种基于LR分量提取的MTS聚类算法（MUTSCA〈LRCE〉）。首先，采用等频离散化方法符号化MTS；然后，计算用于表达MTS样本各维时间序列之间时序模式的LR向量，对每个LR向量进行排序后从其两端提取固定数目的不同关键分量，所有提取的关键分量拼接形成表示MTS样本的模型向量，该过程将不等长MTS样本集转换为等长的模型向量集；最后，采用k-means算法对生成的等长模型向量集进行聚类分析。在多个公共数据集上的实验结果表明，与基于模型的MTS聚类算法——MUTSCA〈LR〉相比，所提算法能够在保证聚类效果的前提下，显著提高不等长MTS数据集的聚类速度。

关键词: 等频离散化, k-means聚类, 时序模式, 多维时间序列, 效率

CLC Number:

HUO Weigang, CHENG Zhen, CHENG Wenli. Improved clustering algorithm for multivariate time series with unequal length[J]. Journal of Computer Applications, 2017, 37(12): 3477-3481.

霍纬纲, 程震, 程文莉. 面向不等长多维时间序列的聚类改进算法[J]. 计算机应用, 2017, 37(12): 3477-3481.

References

[1] LIAO T W. Clustering of time series data-a survey[J]. Pattern Recognition, 2005, 38(11):1857-1874.
[2] CHANDRA B, GUPTA M, GUPTA M P. A multivariate time series clustering approach for crime trends prediction[C]//Proceedings of the 2008 IEEE International Conference on Systems, Man & Cybernetics. Piscataway, NJ:IEEE, 2008:892-896.
[3] 李海林.基于变量相关性的多元时间序列特征表示[J].控制与决策,2015,30(3):441-447.(LI H L. Feature representation of multivariate time series based on correlation among variables[J]. Control and Decision, 2015,,30(3):441-447.)
[4] PLANT C, WOHLSCHLAGER A M, ZHERDIN A. Interaction-based clustering of multivariate time series[C]//Proceedings of the 9th IEEE International Conference on Data Mining. Washington, DC:IEEE Computer Society, 2009:914-919.
[5] WANG X Z, WIRTH A, WANG L. Structure-based statistical features and multivariate time series clustering[C]//Proceedings of the 2007 IEEE International Conference on Data Mining. Piscataway, NJ:IEEE, 2007:351-360.
[6] SUN J. Clustering multivariate time series based on Riemannian manifold[J]. Electronics Letters, 2016, 52(19):1607-1609.
[7] ZHOU P Y, CHAN K C C. A model-based multivariate time series clustering algorithm[C]//Proceedings of the 2014 International Workshops Trends and Applications in Knowledge Discovery and Data Mining, LNCS 8643. Berlin:Springer, 2014:805-817.
[8] KEOGH E. Exact indexing of dynamic time warping[J]. Knowledge and Information Systems, 2005, 7(3):358-386.
[9] YE L, KEOGH E. Time series shapelets:a new primitive for data mining[C]//KDD 2009:Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM, 2009:947-956.
[10] WONG A K C, WU B, WU G P K, et al. Pattern discovery for large mixed-mode database[C]//CIKM 2010:Proceedings of the 19th ACM International Conference on Information & Knowledge Management. New York:ACM, 2010:859-868.
[11] LIU L, WONG A K C, WANG Y. A global optimal algorithm for class-dependent discretization of continuous data[J]. Intelligent Data Analysis, 2004, 8(2):151-170.
[12] PETITJEAN F, KETTERLIN A, GANCARSKI P. A global averaging method for dynamic time warping, with applications to clustering[J]. Pattern Recognition, 2011, 44(3):678-693.

[1]	Yue LI, Dan TANG, Minjun SUN, Xie WANG, Hongliang CAI, Qiong ZENG. Efficient reversible data hiding scheme based on two-dimensional modulo operations [J]. Journal of Computer Applications, 2024, 44(6): 1880-1888.
[2]	Zixuan YUAN, Xiaoqing WENG, Ningzhen GE. Early classification model of multivariate time series based on orthogonal locality preserving projection and cost optimization [J]. Journal of Computer Applications, 2024, 44(6): 1832-1841.
[3]	Zhipeng MAO, Runhe QIU. Energy-spectrum efficiency trade-off for multi-cognitive relay network with decode-and-forward full-duplex maximum energy harvesting [J]. Journal of Computer Applications, 2024, 44(4): 1202-1208.
[4]	Lipeng ZHAO, Bing GUO. Blockchain consensus improvement algorithm based on BDLS [J]. Journal of Computer Applications, 2024, 44(4): 1139-1147.
[5]	Rui TANG, Shibo YUE, Ruizhi ZHANG, Chuan LIU, Chuanlin PANG. Energy efficiency optimization mechanism for UAV-assisted and non-orthogonal multiple access-enabled data collection system [J]. Journal of Computer Applications, 2024, 44(4): 1209-1218.
[6]	Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841.
[7]	Pei ZHAO, Yan QIAO, Rongyao HU, Xinyu YUAN, Minyue LI, Benchu ZHANG. Multivariate time series anomaly detection based on multi-domain feature extraction [J]. Journal of Computer Applications, 2024, 44(11): 3419-3426.
[8]	Yanan SUN, Jiehong WU, Junling SHI, Lijun GAO. Multi-UAV collaborative task assignment method based on improved self-organizing map [J]. Journal of Computer Applications, 2023, 43(5): 1551-1556.
[9]	Xia HUA, Zhenghao ZHU, Cong XU, Xihuang ZHANG, Zhilei CHAI, Wenjie CHEN. Workload automatic mapper for spiking neural network based on precise communication modeling [J]. Journal of Computer Applications, 2023, 43(3): 827-834.
[10]	Xiuping ZHU, Yali LIU, Changlu LIN, Tao LI, Yongquan DONG. Efficient certificateless ring signature scheme based on elliptic curve [J]. Journal of Computer Applications, 2023, 43(11): 3368-3374.
[11]	Qian ZHANG, Runhe QIU. Trade-off between energy efficiency and spectrum efficiency for decode-and-forward full-duplex relay network [J]. Journal of Computer Applications, 2023, 43(10): 3188-3194.
[12]	Liang ZHU, Hua XU, Jinhai CHENG, Shen ZHU. Analysis and improvement of AdaBoost’s sample weight and combination coefficient [J]. Journal of Computer Applications, 2022, 42(7): 2022-2029.
[13]	Youzhi LI, Zhihua HU, Chun CHEN, Peibei YANG, Yajing DONG. Prediction model of transaction pricing in internet freight transport platform based on combination of dual long short-term memory networks [J]. Journal of Computer Applications, 2022, 42(5): 1616-1623.
[14]	Hongwei GUO, Xiangsuo FAN, Shuai LIU, Xiang WEI, Lingli ZHAO. Video coding optimization algorithm based on rate-distortion characteristic [J]. Journal of Computer Applications, 2022, 42(3): 946-952.
[15]	Xin LING, Minzheng LI. Hybrid beamforming method with high spectral efficiency for unmanned aerial vehicle patrol system [J]. Journal of Computer Applications, 2022, 42(3): 980-984.

Improved clustering algorithm for multivariate time series with unequal length

面向不等长多维时间序列的聚类改进算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics