计算机应用 ›› 2013, Vol. 33 ›› Issue (01): 192-198.DOI: 10.3724/SP.J.1087.2013.00192

• 人工智能 • 上一篇    下一篇

基于符号化聚合近似的时间序列相似性复合度量方法

刘芬1,2,郭躬德1,2   

  1. 1. 福建师范大学 数学与计算机科学学院, 福州 350007
    2. 福建师范大学 网络安全与密码技术福建省高校重点实验室, 福州 350007
  • 收稿日期:2012-07-29 修回日期:2012-09-07 出版日期:2013-01-01 发布日期:2013-01-09
  • 通讯作者: 郭躬德
  • 作者简介:刘芬(1989-),女,福建宁德人,硕士研究生,主要研究方向:数据挖掘、时间序列数据挖掘;郭躬德(1965-),男,福建龙岩人,教授,博士,主要研究方向:数据挖掘、机器学习。
  • 基金资助:

    国家自然科学基金资助项目(61070062, 61175123);福建高校产学合作科技重大项目(2010H6007)

Composite metric method for time series similarity measurement based on symbolic aggregate approximation

LIU Fen1,2,GUO Gongde1,2   

  1. 1. Key Laboratory of Network Security and Cryptology, Fujian Normal University, Fuzhou Fujian 350007, China
    2. School of Mathematics and Computer Science, Fujian Normal University, Fuzhou Fujian 350007,China
  • Received:2012-07-29 Revised:2012-09-07 Online:2013-01-01 Published:2013-01-09
  • Contact: GUO Gongde

摘要: 基于关键点的符号化聚合近似(SAX)改进算法(KP_SAX)在SAX的基础上利用关键点对时间序列进行点距离度量,能更有效地计算时间序列的相似性,但对时间序列的模式信息体现不足,仍不能合理地度量时间序列的相似性。针对SAX与KP_SAX存在的缺陷,提出了一种基于SAX的时间序列相似性复合度量方法。综合了点距离和模式距离两种度量,先利用关键点将分段累积近似(PAA)法平均分段进一步细分成各个子分段;再用一个包含此两种距离信息的三元组表示每个子分段;最后利用定义的复合距离度量公式计算时间序列间的相似性,计算结果能更有效地反映时间序列间的差异。实验结果显示,改进方法的时间效率比KP_SAX算法仅降低了0.96%,而在时间序列区分度性能上优于KP_SAX算法和SAX算法。

关键词: 时间序列, 符号化聚合近似, 相似性, 模式距离, 复合度量

Abstract: Key point-based Symbolic Aggregate approximation (SAX) improving algorithm (KP_SAX) uses key points to measure point distance of time series based on SAX, which can measure the similarity of time series more effectively. However, it is too short of information about the patterns of time series to measure the similarity of time series reasonably. To overcome the defects, a composite metric method of time series similarity measurement based on SAX was proposed. The method synthesized both point distance measurement and pattern distance measurement. First, key points were used to further subdivide the Piecewise Aggregate Approximation (PAA) segments into several sub-segments, and then a triple including the information about the two kinds of distance measurement was used to represent each sub-segment. Finally a composite metric formula was used to measure the similarity between two time series. The calculation results can reflect the difference between two time series more effectively. The experimental results show that the proposed method is only 0.96% lower than KP_SAX algorithm in time efficiency. However, it is superior to the KP_SAX algorithm and the traditional SAX algorithm in differentiating between two time series.

Key words: time series, Symbolic Aggregate approximation (SAX), similarity, pattern distance, composite metric

中图分类号: