计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 335-340.DOI: 10.11772/j.issn.1001-9081.2017.02.0335

• 第33届中国数据库学术会议(NDBC 2016) • 上一篇    下一篇

基于多样化top-k shapelets转换的时间序列分类方法

孙其法1, 闫秋艳1,2, 闫欣鸣1   

  1. 1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221006;
    2. 中国矿业大学 安全工程学院, 江苏 徐州 221006
  • 收稿日期:2016-08-12 修回日期:2016-09-07 出版日期:2017-02-10 发布日期:2017-02-11
  • 通讯作者: 闫秋艳,yanqy@cumt.edu.cn
  • 作者简介:孙其法(1991-),男,山东枣庄人,硕士研究生,主要研究方向:时间序列数据挖掘、聚类;闫秋艳(1978-),女,江苏徐州人,副教授,博士,CCF高级会员,主要研究方向:时间序列数据挖掘、流数据挖掘;闫欣鸣(1993-),女,江苏徐州人,硕士研究生,主要研究方向:时间序列数据挖掘。
  • 基金资助:

    江苏省自然科学基金资助项目(BK20140192);中国矿业大学青年科技基金资助项目(2013QNB16)。

Diversified top-k shapelets transform for time series classification

SUN Qifa1, YAN Qiuyan1,2, YAN Xinming1   

  1. 1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221006, China;
    2. School of Safety Engineering, China University of Mining and Technology, Xuzhou Jiangsu 221006, China
  • Received:2016-08-12 Revised:2016-09-07 Online:2017-02-10 Published:2017-02-11
  • Supported by:

    This work is partially supported by the Natural Science Foundation of Jiangsu Province (BK20140192), the Youth Technology Foundation of China University of Mining and Technology (2013QNB16).

摘要:

针对基于shapelets转换的时间序列分类方法中候选shapelets存在较大相似性的问题,提出一种基于多样化top-k shapelets转换的分类方法DivTopKShapelet。该方法采用多样化top-k查询技术,去除相似shapelets,并筛选出最具代表性的k个shapelets集合,最后以最优shapelets集合为特征对数据集进行转换,达到提高分类准确率及时间效率的目的。实验结果表明,DivTopKShapelet分类方法不仅比传统分类方法具有更高的准确率,而且与使用聚类筛选的方法(ClusterShapelet)和shapelets覆盖的方法(ShapeletSelection)相比,分类准确率最多提高了48.43%和32.61%;同时在所有15个数据集上均有计算效率的提升,最少加速了1.09倍,最高可达到287.8倍。

关键词: 时间序列分类, shapelets, 多样化top-k

Abstract:

Focusing on the issue that shapelets candidates can be very similar in time series classification by shapelets transform, a diversified top-k shapelets transform method named DivTopKShapelet was proposed. In DivTopKShapelet, the diversified top-k query method was used to filter similar shapelets and select the k most representative shapelets. Then the optimal shapelets was used to transform data, so as to improve the accuracy and time efficiency of typical time series classification method. Experimental results show that compared with clustering based shapelets classification method (ClusterShapelet) and coverage based shapelets classification method (ShapeletSelction), DivTopKShapelet method can not only improve the traditional time series classification method, but also increase the accuracy by 48.43% and 32.61% at most; at the same time, the proposed method can enhance the computational efficiency in 15 data sets, which is at least 1.09 times and at most 287.8 times.

Key words: time series classification, shapelets, diversified top-k

中图分类号: