计算机应用 ›› 2021, Vol. 41 ›› Issue (5): 1343-1347.DOI: 10.11772/j.issn.1001-9081.2020071142

所属专题: 数据科学与技术

• 数据科学与技术 • 上一篇    下一篇

基于新的鲁棒相似性度量的时间序列聚类

李国荣, 冶继民, 甄远婷   

  1. 西安电子科技大学 数学与统计学院, 西安 710126
  • 收稿日期:2020-07-31 修回日期:2020-09-18 出版日期:2021-05-10 发布日期:2020-11-12
  • 通讯作者: 冶继民
  • 作者简介:李国荣(1996-),女,山西临汾人,硕士研究生,主要研究方向:机器学习、数据挖掘;冶继民(1967-),男,陕西宝鸡人,教授,博士,主要研究方向:盲信号处理、统计信号处理、统计学习方法、随机过程理论及应用;甄远婷(1996-),女,山东济宁人,硕士研究生,主要研究方向:机器学习、数据挖掘。
  • 基金资助:
    陕西省自然科学基础研究计划项目(2020JM-188)。

Time series clustering based on new robust similarity measure

LI Guorong, YE Jimin, ZHEN Yuanting   

  1. School of Mathematics and Statistics, Xidian University, Xi'an Shaanxi 710126, China
  • Received:2020-07-31 Revised:2020-09-18 Online:2021-05-10 Published:2020-11-12
  • Supported by:
    This work is partially supported by Shaanxi Provincial Natural Science Basic Research Program (2020JM-188).

摘要: 针对存在异常值的时间序列数据,提出了一种基于相关系数鲁棒估计的时间序列间的鲁棒广义互相关度量(RGCC)。首先,引入一种鲁棒相关系数代替Pearson相关系数来计算时间序列数据间的协方差矩阵;其次,用新的协方差矩阵的行列式构造两个时间序列间的相似性度量——RGCC;最后,基于该度量计算出序列间的距离矩阵,将其作为聚类算法的输入对数据进行聚类。时间序列聚类仿真实验表明,对存在异常值点的时间序列数据,与基于原始的广义互相关度量(GCC)得到的聚类结果相比,基于RGCC得到的聚类结果明显更接近真实的聚类结果。可见,所提出的新的鲁棒相似性度量完全适用于存在异常值的时间序列数据。

关键词: 时间序列, 聚类, 异常值, 相关系数, 鲁棒估计

Abstract: For time series data with outliers, a Robust Generalized Cross-Correlation measure (RGCC) between time series based on robust estimation of correlation coefficient was proposed. First, a robust correlation coefficient was introduced to replace Pearson correlation coefficient to calculate the covariance matrix between time series data. Second, the determinant of the new covariance matrix was used to construct a similarity measure between two time series named RGCC. Finally, the distance matrix between the time series was calculated based on this measure, and the matrix was used as the input of the clustering algorithm to cluster the data. Time series clustering simulation experiments showed that for time series data with outliers, the clustering results based on RGCC were obviously closer to the real ones compared to the clustering results based on the original Generalized Cross-Correlation measure (GCC). It can be seen that the proposed new robust similarity measure is fully applicable to time series data with outliers.

Key words: time series, clustering, outlier, correlation coefficient, robust estimation

中图分类号: