计算机应用 ›› 2017, Vol. 37 ›› Issue (8): 2343-2348.DOI: 10.11772/j.issn.1001-9081.2017.08.2343

• 数据科学与技术 • 上一篇    下一篇

基于趋势特征表示的shapelet分类方法

闫欣鸣, 孟凡荣, 闫秋艳   

  1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116
  • 收稿日期:2017-02-22 修回日期:2017-05-03 出版日期:2017-08-10 发布日期:2017-08-12
  • 通讯作者: 闫欣鸣
  • 作者简介:闫欣鸣(1993-),女,江苏徐州人,硕士研究生,主要研究方向:时间序列数据挖掘;孟凡荣(1962-),女,辽宁沈阳人,教授,博士,主要研究方向:数据库、数据挖掘;闫秋艳(1978-),女,江苏徐州人,副教授,博士,主要研究方向:时间序列数据挖掘、机器学习。
  • 基金资助:
    国家重点研发计划项目(2016YFC060908);国家自然科学基金资助项目(61402482,61572505,52674255);江苏省自然科学基金资助项目(BK20140192)。

Shapelet classification method based on trend feature representation

YAN Xinming, MENG Fanrong, YAN Qiuyan   

  1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116, China
  • Received:2017-02-22 Revised:2017-05-03 Online:2017-08-10 Published:2017-08-12
  • Supported by:
    This work is partially supported by the National Key Research and Development Program (2016YFC060908),the National Natural Science Foundation of China (61402482,61572505,51674255),the Natural Science Foundation of Jiangsu Province (BK20140192).

摘要: Shapelet是一种具有辨识性的时间序列子序列,通过识别局部特征达到对时间序列准确分类的目的。原始shapelet发现算法效率较低,大量工作关注于提高shapelet发现的效率。然而,对于带有趋势变化的时间序列,采用典型的时间序列表示方法进行shapelet发现,容易造成序列中趋势信息的丢失。为了解决时间序列趋势信息丢失的问题,提出一种基于趋势特征的多样化top-k shapelet分类方法:首先采用趋势特征符号化方法对时间序列的趋势信息进行表示;然后针对序列的趋势特征符号获取shapelet候选集合;最后通过引入多样化top-k查询算法从候选集中选取k个最具代表性的shapelets。在时间序列的分类实验中,与传统分类算法相比,所提方法在11个数据集上的分类准确率均有提升;与FastShapelet算法相比,提升了运行效率,缩短了算法的运行时间,并在趋势信息明显的数据上效果显著。结果表明,所提方法能有效提高时间序列的分类准确率,提升算法运行效率。

关键词: shapelet, 趋势特征, 符号化, 多样化top-k查询, 时间序列分类

Abstract: Shapelet is a kind of recognizable time series sub-sequence, by identifying the local characteristics to achieve the purpose of accurate classification of time series. The original shapelet discovery algorithm has low efficiency, and much work has focused on improving the efficiency of shapelet discovery. However, for the time series with trend change, the typical time series representation is used for shapelet discovery, which tends to cause the loss of trend information in the sequence. In order to solve this problem, a new trend-based diversified top-k shapelet classification method was proposed. Firstly, the method of trend feature symbolization was used to represent the trend information of time series. Then, the shapelet candidate set was obtained according to the trend signature of the sequence. Finally, the most representative k shapelets were selected from the candidate set by introducing the diversifying top-k query algorithm. Experimental results of time series classification show that compared with the traditional classification algorithms, the accuracy of the proposed method was improved on 11 experimental data sets; compared with FastShapelet algorithm, the efficiency was improved, the running time of the proposed method was shortened, specially for the data with obvious trend information. The experimental results indicate that the proposed method can effectively improve the accuracy and the effciency of time series classification.

Key words: shapelet, trend feature, symbolization, diversified top-k query, time series classification

中图分类号: