Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (8): 2349-2356.DOI: 10.11772/j.issn.1001-9081.2017.08.2349

Previous Articles     Next Articles

Clustering algorithm of time series with optimal u-shapelets

YU Siqin, YAN Qiuyan, YAN Xinming   

  1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116, China
  • Received:2017-01-10 Revised:2017-02-22 Online:2017-08-10 Published:2017-08-12
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (51674255),the Natural Science Foundation of Jiangsu Province of China (BK20140192).

基于最佳u-shapelets的时间序列聚类算法

余思琴, 闫秋艳, 闫欣鸣   

  1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116
  • 作者简介:余思琴(1995-),女,江西萍乡人,硕士研究生,主要研究方向:时间序列数据挖掘;闫秋艳(1978-),女,江苏徐州人,副教授,博士,主要研究方向:时间序列数据挖掘、机器学习闫欣鸣(1993-),女,江苏徐州人,硕士研究生,主要研究方向:时间序列数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(51674255);江苏省自然科学基金资助项目(BK20140192)。

Abstract: Focusing on low quality of u-shapelets (unsupervised shapelets) in time series clustering based on u-shapelets, a time series clustering method based on optimal u-shapelets named DivUshapCluster was proposed. Firstly, the influence of different subsequence quality assessment methods on time series clustering results based on u-shapelets was discussed. Secondly, the selected best subsequence quality assessment method was used to evaluate the quality of the u-shapelet candidates. Then, the diversified top-k query technology was used to remove redundant u-shapelets from the u-shapelet candidates and select the optimal u-shapelets. Finally, the optimal u-shapelets set was used to transform the original dataset, so as to improve the accuracy of the time series clustering. The experimental results show that the DivUshapCluster method is superior to the traditional time series clustering methods in terms of clustering accuracy. Compared with the BruteForce method and the SUSh method, the average clustering accuracy of DivUshapCluster method is increased by 18.80% and 19.38% on 22 datasets, respectively. The proposed method can effectively improve the clustering accuracy of time series in the case of ensuring the overall efficiency.

Key words: time series, clustering, u-shapelets, internal clustering evaluation measurement, diversified top-k query

摘要: 针对基于u-shapelets的时间序列聚类中u-shapelets集合质量较低的问题,提出一种基于最佳u-shapelets的时间序列聚类算法DivUshapCluster。首先,探讨不同子序列质量评估方法对基于u-shapelets的时间序列聚类结果的影响;然后,选用最佳的子序列质量评估方法对u-shapelet候选集进行质量评估;其次,引入多元top-k查询技术对u-shapelet候选集进行去除冗余操作,搜索出最佳的u-shapelets集合;最后,利用最佳u-shapelets集合对原始数据集进行转化,达到提高时间序列聚类准确率的目的。实验结果表明,DivUshapCluster算法在聚类准确度上不仅优于经典的时间序列聚类算法,而且与BruteForce算法和SUSh算法相比,DivUshapCluster算法在22个数据集上的平均聚类准确度分别提高了18.80%和19.38%。所提算法能够在保证整体效率的情况下有效提高时间序列的聚类准确度。

关键词: 时间序列, 聚类, u-shapelets, 内部聚类评价指标, 多元化top-k查询

CLC Number: