计算机应用 ›› 2019, Vol. 39 ›› Issue (2): 414-420.DOI: 10.11772/j.issn.1001-9081.2018061326

• 数据科学与技术 • 上一篇    下一篇

基于子序列全连接和最大团的时间序列模体发现算法

朱跃龙, 朱晓晓, 王继民   

  1. 河海大学 计算机与信息学院 南京 211100
  • 收稿日期:2018-06-25 修回日期:2018-07-31 出版日期:2019-02-10 发布日期:2019-02-15
  • 通讯作者: 朱晓晓
  • 作者简介:朱跃龙(1959-),男,江苏建湖人,教授,博士,主要研究方向:智能信息处理、数据挖掘;朱晓晓(1994-),女,山东滕州人,硕士研究生,主要研究方向:数据挖掘;王继民(1976-),男,安徽全椒人,副教授,硕士,主要研究方向:数据挖掘、智能信息处理。

Time series motif discovery algorithm based on subsequence full join and maximum clique

ZHU Yuelong, ZHU Xiaoxiao, WANG Jimin   

  1. College of Computer and Information, Hohai University, Nanjing Jiangsu 211100
  • Received:2018-06-25 Revised:2018-07-31 Online:2019-02-10 Published:2019-02-15

摘要: 针对时间序列模体发现算法计算复杂,并且无法发现多实例模体的问题,提出基于子序列全连接和最大团的时间序列模体发现(TSSJMC)算法。首先,使用快速时间序列子序列全连接算法求得所有子序列之间的距离,生成距离矩阵;然后,设置相似性阈值,将距离矩阵转化为邻接矩阵,构造子序列相似图;最后采用最大团搜索算法从相似图中搜索最大团,最大团的顶点对应的时间序列为包含最多实例的模体。在公开的时间序列数据集上进行实验,选用已有的能够发现多实例模体的Brute Force和Random Projection算法作为对比对象,分别从准确性、效率、可扩展性和鲁棒性对TSSJMC算法进行分析并获得了客观的评判结果。实验结果表明,与Random Projection算法相比,TSSJMC算法在效率、可扩展性和鲁棒性法方面均有明显优势;与Brute Force算法相比,TSSJMC算法发现的模体实例数量虽略低,但其效率和可扩展性都优于Brute Force算法。因此,TSSJMC是质量和效率相平衡的算法。

关键词: 时间序列, 时间序列子序列, 子序列连接, 最大团, 模体发现

Abstract: Existing time series motif discovery algorithms have high computational complexity and cannot find multi-instance motifs. To overcome these defects, a Time Series motif discovery algorithm based on Subsequence full Joins and Maximum Clique (TSSJMC) was proposed. Firstly, the fast time series subsequence full join algorithm was used to obtain the distance between all subsequences and generate the distance matrix. Then, the similarity threshold was set, the distance matrix was transformed into the adjacency matrix, and the sub-sequence similarity graph was constructed. Finally, the maximum clique in the similarity graph was extracted by the maximum clique search algorithm, and the time series corresponding to the vertices of the maximum clique were the motifs containing most instances. In the experiments on public time series datasets, TSSJMC algorithm was compared with Brute Force algorithm and Random Projection algorithm which also could find multi-instance motifs in accuracy, efficiency, scalability and robustness. The experimental results demonstrate that compared with Random Projection algorithm, TSSJMC algorithm has obvious advantages in terms of efficiency, scalability and robustness; compared with Bruce Force algorithm, TSSJMC algorithm finds slightly less motif instances, but its efficiency and scalability are better. Therefore, TSSJMC is an algorithm that balances quality and efficiency.

Key words: time series, time series subsequence, subsequence join, maximum clique, motif discovery

中图分类号: