Clustering algorithm of time series with optimal u-shapelets

doi:10.11772/j.issn.1001-9081.2017.08.2349

Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (8): 2349-2356.DOI: 10.11772/j.issn.1001-9081.2017.08.2349

Previous Articles Next Articles

Clustering algorithm of time series with optimal u-shapelets

YU Siqin, YAN Qiuyan, YAN Xinming

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116, China

Received:2017-01-10 Revised:2017-02-22 Online:2017-08-12 Published:2017-08-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (51674255),the Natural Science Foundation of Jiangsu Province of China (BK20140192).

基于最佳u-shapelets的时间序列聚类算法

余思琴, 闫秋艳, 闫欣鸣

中国矿业大学计算机科学与技术学院, 江苏徐州 221116

作者简介:余思琴(1995-),女,江西萍乡人,硕士研究生,主要研究方向:时间序列数据挖掘;闫秋艳(1978-),女,江苏徐州人,副教授,博士,主要研究方向:时间序列数据挖掘、机器学习闫欣鸣(1993-),女,江苏徐州人,硕士研究生,主要研究方向:时间序列数据挖掘。
基金资助:
国家自然科学基金资助项目（51674255）；江苏省自然科学基金资助项目（BK20140192）。

Abstract

Abstract: Focusing on low quality of u-shapelets (unsupervised shapelets) in time series clustering based on u-shapelets, a time series clustering method based on optimal u-shapelets named DivUshapCluster was proposed. Firstly, the influence of different subsequence quality assessment methods on time series clustering results based on u-shapelets was discussed. Secondly, the selected best subsequence quality assessment method was used to evaluate the quality of the u-shapelet candidates. Then, the diversified top-k query technology was used to remove redundant u-shapelets from the u-shapelet candidates and select the optimal u-shapelets. Finally, the optimal u-shapelets set was used to transform the original dataset, so as to improve the accuracy of the time series clustering. The experimental results show that the DivUshapCluster method is superior to the traditional time series clustering methods in terms of clustering accuracy. Compared with the BruteForce method and the SUSh method, the average clustering accuracy of DivUshapCluster method is increased by 18.80% and 19.38% on 22 datasets, respectively. The proposed method can effectively improve the clustering accuracy of time series in the case of ensuring the overall efficiency.

Key words: time series, clustering, u-shapelets, internal clustering evaluation measurement, diversified top-k query

摘要： 针对基于u-shapelets的时间序列聚类中u-shapelets集合质量较低的问题，提出一种基于最佳u-shapelets的时间序列聚类算法DivUshapCluster。首先，探讨不同子序列质量评估方法对基于u-shapelets的时间序列聚类结果的影响；然后，选用最佳的子序列质量评估方法对u-shapelet候选集进行质量评估；其次，引入多元top-k查询技术对u-shapelet候选集进行去除冗余操作，搜索出最佳的u-shapelets集合；最后，利用最佳u-shapelets集合对原始数据集进行转化，达到提高时间序列聚类准确率的目的。实验结果表明，DivUshapCluster算法在聚类准确度上不仅优于经典的时间序列聚类算法，而且与BruteForce算法和SUSh算法相比，DivUshapCluster算法在22个数据集上的平均聚类准确度分别提高了18.80%和19.38%。所提算法能够在保证整体效率的情况下有效提高时间序列的聚类准确度。

关键词: 时间序列, 聚类, u-shapelets, 内部聚类评价指标, 多元化top-k查询

CLC Number:

TP311.13

YU Siqin, YAN Qiuyan, YAN Xinming. Clustering algorithm of time series with optimal u-shapelets[J]. Journal of Computer Applications, 2017, 37(8): 2349-2356.

余思琴, 闫秋艳, 闫欣鸣. 基于最佳u-shapelets的时间序列聚类算法[J]. 计算机应用, 2017, 37(8): 2349-2356.

References

[1] RUIZ E J, HRISTIDIS V, CASTILLO C, et al. Correlating financial time series with micro-blogging activity[C]//WSDM 2012:Proceeding of the fifth ACM International Conference on Web Search and Data Mining. New York:ACM, 2012:513-522.
[2] HONDA R, WANG S, KIKUCHI T, et al. Mining of moving objects from time-series images and its application to satellite weather imagery[J]. Journal of Intelligent Information Systems, 2002, 19(1):79-93.
[3] HIRANO S, TSUMOTO S. Cluster analysis of time-series medical data based on the trajectory representation and multiscale comparison techniques[C]//ICDM 2006:Proceedings of the Sixth International Conference on Data Mining. Washington, DC:IEEE Computer Society, 2006:896-901.
[4] JIANG D, PEI J, RAMANATHAN M, et al. Mining gene-sample-time microarray data:a coherent gene cluster discovery approach[J]. Knowledge and Information Systems, 2007, 13(3):305-335.
[5] YE L, KEOGH E. Time series shapelets:a novel technique that allows accurate, interpretable and fast classification[J]. Data Mining and Knowledge Discovery, 2011, 22(1/2):149-182.
[6] 原继东,王志海,韩萌.基于Shapelet剪枝和覆盖的时间序列分类算法[J].软件学报,2015,26(9):2311-2325. (YUAN J D, WANG Z H, HAN M. Shapelet pruning and Shapelet coverage for time series classification[J]. Journal of Software, 2015, 26(9):2311-2325.)
[7] 孙其法,闫秋艳,闫欣鸣.基于多样化top-k shapelets转换的时间序列分类方法[J].计算机应用,2017,37(2):335-340. (SUN Q F, YAN Q Y, YAN X M. Diversified top-k shapelets transform for time series classification[J]. Journal of Computer Applications, 2017, 37(2):335-340.)
[8] ZAKARIA J, MUEEN A, KEOGH E. Clustering time series using unsupervised-shapelets[C]//ICDM 2012:Proceedings of the IEEE 12th International Conference on Data Mining. Washington, DC:IEEE Computer Society, 2012:785-794.
[9] ZAKARIA J, MUEEN A, KEOGH E, et al. Accelerating the discovery of unsupervised-shapelets[J]. Data Mining and Knowledge Discovery, 2016, 30(1):243-281.
[10] ULANOVA L, BEGUM N, KEOGH E. Scalable clustering of time series with u-shapelets[C]//SDM 2015:Proceedings of the 2015 SIAM International Conference on Data Mining. Philadelphia, PA:SIAM, 2015:900-908.
[11] QIN L, YU J X, CHANG L. Diversifying top-k results[J]. Proceedings of the VLDB Endowment, 2012, 5(11):1124-1135.
[12] HALKIDI M, BATISTAKIS Y, VAZIRGIANNIS M, et al. On clustering validation techniques[J]. Journal of Intelligent Information Systems, 2001, 17(2):107-145.
[13] HASSANI M, SEIDL T. Internal clustering evaluation of data streams[C]//PAKDD 2015 Workshops:Proceedings of the 2015 Trends and Applications in Knowledge Discovery and Data Mining, LNCS 9441. Berlin:Springer-Verlag, 2015:198-209.
[14] MAULIK U, BANDYOPADHYAY S. Performance evaluation of some clustering algorithms and validity indices[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(12):1650-1654.
[15] ZHANG Y, CALLAN J, MINKA T. Novelty and redundancy detection in adaptive filtering[C]//SIGIR' 02:Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2002:81-88.
[16] AGRAWAL R, GOLLAPUDI S, HALVERSON A, et al. Diversifying search results[C]//WSDM' 09:Proceeding of the Second ACM International Conference on Web Search and Data Mining. New York:ACM, 2009:5-14.
[17] YUAN L, QIN L, LIN X, et al. Diversified top-k clique search[J]. The VLDB Journal, 2016, 25(2):171-196.
[18] CHEN Y, KEOGH E, HU B, et al. The UCR time series classification archive[DB/OL].[2015-07-01]. www.cs.ucr.edu/~eamonn/time_series_data/.

Clustering algorithm of time series with optimal u-shapelets

基于最佳u-shapelets的时间序列聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[2]	Qinzhuang ZHAO, Hongye TAN. Time series causal inference method based on adaptive threshold learning [J]. Journal of Computer Applications, 2024, 44(9): 2660-2666.
[3]	Lilin FAN, Fukang CAO, Wanting WANG, Kai YANG, Zhaoyu SONG. Intermittent demand forecasting method based on adaptive matching of demand patterns [J]. Journal of Computer Applications, 2024, 44(9): 2747-2755.
[4]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[5]	Qing WANG, Jieyu ZHAO, Xulun YE, Nongxiao WANG. Enhanced deep subspace clustering method with unified framework [J]. Journal of Computer Applications, 2024, 44(7): 1995-2003.
[6]	Mei WANG, Xuesong SU, Jia LIU, Ruonan YIN, Shan HUANG. Time series classification method based on multi-scale cross-attention fusion in time-frequency domain [J]. Journal of Computer Applications, 2024, 44(6): 1842-1847.
[7]	Zixuan YUAN, Xiaoqing WENG, Ningzhen GE. Early classification model of multivariate time series based on orthogonal locality preserving projection and cost optimization [J]. Journal of Computer Applications, 2024, 44(6): 1832-1841.
[8]	Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682.
[9]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.
[10]	Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN. Deep event clustering method based on event representation and contrastive learning [J]. Journal of Computer Applications, 2024, 44(6): 1734-1742.
[11]	Hongtao SONG, Jiangsheng YU, Qilong HAN. Industrial multivariate time series data quality assessment method [J]. Journal of Computer Applications, 2024, 44(6): 1743-1750.
[12]	Tianyu HUANG, Yuanxing LI, Hao CHEN, Zijia GUO, Mingjun WEI. User cluster partitioning method based on weighted fuzzy clustering in ground-air collaboration scenarios [J]. Journal of Computer Applications, 2024, 44(5): 1555-1561.
[13]	Lin GAO, Yu ZHOU, Tak Wu KWONG. Evolutionary bi-level adaptive local feature selection [J]. Journal of Computer Applications, 2024, 44(5): 1408-1414.
[14]	Fan MENG, Qunli YANG, Jing HUO, Xinkuan WANG. EraseMTS： iterative active multivariable time series anomaly detection algorithm based on margin anomaly candidate set [J]. Journal of Computer Applications, 2024, 44(5): 1458-1463.
[15]	Tongtong XU, Bin XIE, Chunhao ZHANG, Ximei ZHANG. Multi-order nearest neighbor graph clustering algorithm by fusing transition probability matrix [J]. Journal of Computer Applications, 2024, 44(5): 1527-1538.