计算机应用 ›› 2016, Vol. 36 ›› Issue (12): 3285-3291.DOI: 10.11772/j.issn.1001-9081.2016.12.3285

• 先进计算 • 上一篇    下一篇

基于分布式架构的时间序列局部相似检测算法

林炀, 江育娥, 林劼   

  1. 福建师范大学 软件学院, 福州 350108
  • 收稿日期:2016-06-22 修回日期:2016-08-25 出版日期:2016-12-10 发布日期:2016-12-08
  • 通讯作者: 林劼
  • 作者简介:林炀(1991-),男,福建福州人,硕士研究生,主要研究方向:时间序列、大数据挖掘;江育娥(1970-),女,福建古田人,教授,博士,主要研究方向:数据挖掘;林劼(1972-),男,福建三明人,副教授,博士,主要研究方向:数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61472082);福建省自然科学基金资助项目(2014J01220)。

Local similarity detection algorithm for time series based on distributed architecture

LIN Yang, JIANG Yu'e, LIN Jie   

  1. Faculty of Software, Fujian Normal University, Fuzhou Fujian 350108, China
  • Received:2016-06-22 Revised:2016-08-25 Online:2016-12-10 Published:2016-12-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61472082), the Natural Science Foundation of Fujian Province (2014J01220).LIN Yang, born in 1991, M. S. candidate. His research interests include time series, big data mining.

摘要: 基于动态时间规整算法思想的CrossMatch算法可以用来解决序列间的部分相似问题,但是由于算法时间空间复杂度过高,需要消耗大量的计算资源,因此无法应用于长序列之间的计算。针对以上问题,提出了一个基于分布式平台上的时间序列局部相似性检测算法。将CrossMatch算法实现在了分布式框架上,解决了计算资源不足的问题。首先需要对序列进行切分,分别放置在不同的节点上;其次,各节点分别处理各自序列的相似部分;最后,通过对结果进行汇总并拼接,找出序列间的局部相似。实验结果表明,该算法在准确性上和CrossMatch相近,在时间上也有提升。改进后的分布式算法不仅解决了单机无法处理的长序列计算问题,而且可以通过增加并行计算节点数提高运行速度。

关键词: 动态时间规整, MapReduce, 时间序列, 局部相似性, 并行化

Abstract: The CrossMatch algorithm based on the idea of Dynamic Time Warping (DTW) can be used to solve the problems of local similarity between time series. However, due to the high complexity of time and space, large amounts of computing resources are required. Thus, it is almost impossible to be used for long sequences. To solve the above mentioned problems, a new algorithm for local similarity detection based on distributed platform was proposed. The proposed algorithm was a distributed solution for CrossMatch. The problem of insufficient computing resources including time and space requirement was solved. Firstly, the series should be splited and distributed on several nodes. Secondly, the local similarity of every node's own series was dealt with. Finally, the results would be merged and assembled in order to find the local similarity of series. The experimental results show that the accuracy between the proposed algorithm and the CrossMatch algorithm is similar, and the proposed algorithm uses less time. The improved distributed algorithm can not only solve the computation problem of long sequence of time series which can not be processed by a single machine, but also improve the running speed by increasing the number of parallel computing nodes.

Key words: Dynamic Time Warping (DTW), MapReduce, time series, local similarity, parallelization

中图分类号: