基于分布式架构的时间序列局部相似检测算法

doi:10.11772/j.issn.1001-9081.2016.12.3285

计算机应用 ›› 2016, Vol. 36 ›› Issue (12): 3285-3291.DOI: 10.11772/j.issn.1001-9081.2016.12.3285

基于分布式架构的时间序列局部相似检测算法

林炀, 江育娥, 林劼

福建师范大学软件学院, 福州 350108

收稿日期:2016-06-22 修回日期:2016-08-25 出版日期:2016-12-10 发布日期:2016-12-08
通讯作者: 林劼
作者简介:林炀(1991-),男,福建福州人,硕士研究生,主要研究方向:时间序列、大数据挖掘;江育娥(1970-),女,福建古田人,教授,博士,主要研究方向:数据挖掘;林劼(1972-),男,福建三明人,副教授,博士,主要研究方向:数据挖掘。
基金资助:
国家自然科学基金资助项目（61472082）；福建省自然科学基金资助项目（2014J01220）。

Local similarity detection algorithm for time series based on distributed architecture

LIN Yang, JIANG Yu'e, LIN Jie

Faculty of Software, Fujian Normal University, Fuzhou Fujian 350108, China

Received:2016-06-22 Revised:2016-08-25 Online:2016-12-10 Published:2016-12-08
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61472082), the Natural Science Foundation of Fujian Province (2014J01220).LIN Yang, born in 1991, M. S. candidate. His research interests include time series, big data mining.

摘要/Abstract

摘要： 基于动态时间规整算法思想的CrossMatch算法可以用来解决序列间的部分相似问题，但是由于算法时间空间复杂度过高，需要消耗大量的计算资源，因此无法应用于长序列之间的计算。针对以上问题，提出了一个基于分布式平台上的时间序列局部相似性检测算法。将CrossMatch算法实现在了分布式框架上，解决了计算资源不足的问题。首先需要对序列进行切分，分别放置在不同的节点上；其次，各节点分别处理各自序列的相似部分；最后，通过对结果进行汇总并拼接，找出序列间的局部相似。实验结果表明，该算法在准确性上和CrossMatch相近，在时间上也有提升。改进后的分布式算法不仅解决了单机无法处理的长序列计算问题，而且可以通过增加并行计算节点数提高运行速度。

关键词: 动态时间规整, MapReduce, 时间序列, 局部相似性, 并行化

Abstract: The CrossMatch algorithm based on the idea of Dynamic Time Warping (DTW) can be used to solve the problems of local similarity between time series. However, due to the high complexity of time and space, large amounts of computing resources are required. Thus, it is almost impossible to be used for long sequences. To solve the above mentioned problems, a new algorithm for local similarity detection based on distributed platform was proposed. The proposed algorithm was a distributed solution for CrossMatch. The problem of insufficient computing resources including time and space requirement was solved. Firstly, the series should be splited and distributed on several nodes. Secondly, the local similarity of every node's own series was dealt with. Finally, the results would be merged and assembled in order to find the local similarity of series. The experimental results show that the accuracy between the proposed algorithm and the CrossMatch algorithm is similar, and the proposed algorithm uses less time. The improved distributed algorithm can not only solve the computation problem of long sequence of time series which can not be processed by a single machine, but also improve the running speed by increasing the number of parallel computing nodes.

Key words: Dynamic Time Warping (DTW), MapReduce, time series, local similarity, parallelization

中图分类号:

林炀, 江育娥, 林劼. 基于分布式架构的时间序列局部相似检测算法[J]. 计算机应用, 2016, 36(12): 3285-3291.

LIN Yang, JIANG Yu'e, LIN Jie. Local similarity detection algorithm for time series based on distributed architecture[J]. Journal of Computer Applications, 2016, 36(12): 3285-3291.

参考文献

[1] 张炜,范年柏,汪文佳.基于自适应遗传算法的股票预测模型研究[J].计算机工程与应用,2015,51(4):254-259.(ZHANG W, FAN N B, WANG W J. Stock prediction model research based on improved adaptive genetic algorithm[J]. Computer Engineering and Applications, 2015, 51(4):254-259.)
[2] 周治平,苗敏敏.改进的马氏距离动态时间规整手势认证方法[J].计算机应用,2015,35(5):1467-1470.(ZHOU Z P, MIAO M M. Dynamic time warping gesture authentication algorithm based on improved Mahalanobis distance[J]. Journal of Computer Applications, 2015, 35(5):1467-1470.)
[3] 纪丽珍,李鹏,李林,等.冠心病患者心脏电-机械活动时间序列的熵分析[J].计算机工程与应用,2016,52(10):265-270.(JI L Z, LI P, LI L, et al. Analysis of cardiac electro-mechanical time-series in patients with coronary artery disease based on entropy[J]. Computer Engineering and Applications, 2016, 52(10):265-270.)
[4] TEMME C, EBINGHAUS R, EINAX J W, et al. Time series analysis of long-term data sets of atmospheric mercury concentrations[J]. Analytical and Bioanalytical Chemistry, 2004, 380(3):493-501.
[5] 苑卫国,刘云.微博用户特征量增长规律研究[J].计算机研究与发展,2015,52(2):522-532.(YUAN W G, LIU Y. Growth law of user characteristics in microblog[J]. Journal of Computer Research and Development, 2015, 52(2):522-532.)
[6] 程习锋,万定生,王亚明.水文时间序列相似性查询优化算法[J].计算机工程与设计,2013,34(11):4046-4050.(CHENG X F, WAN D S, WANG Y M. Similarity search optimization algorithm in hydrological time series[J]. Computer Engineering and Design, 2013, 34(11):4046-4050.)
[7] 唐毅,刘卫宁,孙棣华,等.改进时间序列模型在高速公路短时交通流量预测中的应用[J].计算机应用研究,2015,32(1):146-149.(TANG Y, LIU W N, SUN D H, et al. Application of improved time series model in forecasting of short-term traffic flow for freeway[J]. Application Research of Computers, 2015, 32(1):146-149.)
[8] KEOGH E, KASETTY S. On the need for time series data mining benchmarks:a survey and empirical demonstration[J]. Data Mining and Knowledge Discovery, 2003, 7(4):349-371.
[9] BERNDT D J, CLIFFORD J. Finding patterns in time series:a dynamic programming approach[M]//Advances in Knowledge Discovery and Data Mining. Menlo Park, CA:American Association for Artificial Intelligence, 1996:229-248.
[10] 李正欣,张凤鸣,李克武,等.一种支持DTW距离的多元时间序列索引结构[J].软件学报,2014,25(3):560-575.(LI Z X, ZHANG F M, LI K W, et al. Index structure for multivariate time series under DTW distance metric[J]. Journal of Software, 2014, 25(3):560-575.)
[11] TOYODA M, SAKURAI Y. Discovery of cross-similarity in data streams[C]//Proceedings of the 2010 IEEE 26th International Conference on Data Engineering. Piscataway, NJ:IEEE, 2010:101-104.
[12] TOYODA M, SAKURAI Y, ISHIKAWA Y. Pattern discovery in data streams under the time warping distance[J]. The VLDB Journal, 2013, 22(3):295-318.
[13] KEOGH E, RATANAMAHATANA C A. Exact indexing of dynamic time warping[J]. Knowledge and information systems, 2005, 7(3):358-386.
[14] SAKOE H, CHIBA S. Dynamic programming algorithm optimization for spoken word recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1978, 26(1):43-49.
[15] ITAKURA F. Minimum prediction residual principle applied to speech recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Processing, 1975, 23(1):67-72.
[16] SALVADOR S, CHAN P. Toward accurate dynamic time warping in linear time and space[J]. Intelligent Data Analysis, 2007, 11(5):561-580.
[17] KIM M S, KIM S W, SHIN M. Optimization of subsequence matching under time warping in time-series databases[C]//SAC'05:Proceedings of the 2005 ACM Symposium on Applied Computing. New York:ACM, 2005:581-586.
[18] HONG Y, SHUQIANG Y, SHAODONG M, et al. A novel parallel scheme for fast similarity search in large time series[J]. China Communications, 2015, 12(2):129-140.

基于分布式架构的时间序列局部相似检测算法

Local similarity detection algorithm for time series based on distributed architecture

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王周恺, 张炯, 马维纲, 王怀军. 面向高速列车监测数据的并行解压缩算法[J]. 计算机应用, 2021, 41(9): 2586-2593.
[2]	李洋莹, 陈智军, 张子豪, 游兰. 基于改进Elman神经网络的制糖企业原糖需求预测模型[J]. 计算机应用, 2021, 41(7): 2113-2120.
[3]	蒋林, 施佳琪, 李远成. 可重构结构下合成视点失真变化算法并行设计与实现[J]. 计算机应用, 2021, 41(6): 1734-1740.
[4]	李国荣, 冶继民, 甄远婷. 基于新的鲁棒相似性度量的时间序列聚类[J]. 计算机应用, 2021, 41(5): 1343-1347.
[5]	沈忱, 邰凌翔, 彭煜玮. 面向自动参数调优的动态负载匹配方法[J]. 计算机应用, 2021, 41(3): 657-661.
[6]	张凌哲, 黄向东, 乔嘉林, 勾王敏浩, 王建民. 面向时序数据的两阶段日志结构合并树文件合并框架[J]. 计算机应用, 2021, 41(3): 618-622.
[7]	曹阳, 闫秋艳, 吴鑫. 不平衡时间序列集成分类算法[J]. 计算机应用, 2021, 41(3): 651-656.
[8]	马停停, 冀天娇, 杨冠羽, 陈阳, 许文波, 刘宏图. 基于长短时记忆神经网络的手足口病发病趋势预测[J]. 计算机应用, 2021, 41(1): 265-269.
[9]	钱斌, 郑楷洪, 陈子鹏, 肖勇, 李森, 叶纯壮, 马千里. 基于残差连接长短期记忆网络的时间序列修复模型[J]. 计算机应用, 2021, 41(1): 243-248.
[10]	肖勇, 郑楷洪, 郑镇境, 钱斌, 李森, 马千里. 基于多尺度跳跃深度长短期记忆网络的短期多变量负荷预测[J]. 计算机应用, 2021, 41(1): 231-236.
[11]	高世乐, 王滢, 李海林, 万校基. 基于矩阵画像的金融时序数据预测方法[J]. 计算机应用, 2021, 41(1): 199-207.
[12]	郭秀婷, 朱昶胜, 张生财, 赵奎鹏. 分形插值在风速时间序列中的应用[J]. 计算机应用, 2020, 40(9): 2628-2633.
[13]	夏伦腾, 张莉. 基于K近邻和动态时间规整算法的盲人物联网手杖系统[J]. 计算机应用, 2020, 40(8): 2441-2448.
[14]	李卫中. 基于场景局部特征的多曝光图像融合[J]. 计算机应用, 2020, 40(8): 2365-2371.
[15]	胡珉, 白雪, 徐伟, 吴秉键. 多维时间序列异常检测算法综述[J]. 计算机应用, 2020, 40(6): 1553-1564.