Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (3): 651-656.DOI: 10.11772/j.issn.1001-9081.2020091493

Special Issue: 第37届CCF中国数据库学术会议(NDBC 2020)

• The 37th CCF National Database Conference (NDBC 2020) • Previous Articles     Next Articles

Ensemble classification algorithm for imbalanced time series

CAO Yang1, YAN Qiuyan1,2, WU Xin1   

  1. 1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou Jiangsu 221116, China;
    2. Research Center of Innovation on Intelligent Prevention of Disaster and Emergency Rescue, China University of Mining and Technology, Xuzhou Jiangsu 221116, China
  • Received:2020-09-07 Revised:2020-11-30 Online:2021-03-10 Published:2020-12-15
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61977061, 51934007).

不平衡时间序列集成分类算法

曹阳1, 闫秋艳1,2, 吴鑫1   

  1. 1. 中国矿业大学 计算机科学与技术学院, 江苏 徐州 221116;
    2. 中国矿业大学 灾害智能防控与应急救援创新研究中心, 江苏 徐州 221116
  • 通讯作者: 闫秋艳
  • 作者简介:曹阳(1995-),男,湖北孝感人,硕士,主要研究方向:时间序列数据挖掘;闫秋艳(1978-),女,江苏徐州人,副教授,博士,CCF会员,主要研究方向:时间序列数据挖掘;吴鑫(1997-),男,江苏无锡人,硕士研究生,主要研究方向:时间序列数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(61977061,51934007)。

Abstract: Aiming at the problem that the existing ensemble classification methods have poor learning ability for unbalanced time series data, the idea of optimizing component algorithm performance and integration strategy was adopted, and based on the heterogeneous ensemble method Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE), an ensemble classification algorithm IMHIVE-COTE (Imbalanced Hierarchical Vote Collective of Transformation-based Ensembles) for unbalanced time series was proposed. The algorithm mainly contains two improvements:first, a new unbalanced classification component SBST-HESCA (SMOM (K-NN-based Synthetic Minority Oversampling algorithm for Multiclass imbalance problems) & Boosting into ST-HESCA (Shapelet Transformation-Heterogeneous Ensembles of Standard Classification Algorithm) algorithm) was added, the idea of boosting combined with resampling was introduced, and the sample weights were updated through cross-validation prediction results, so as to make the re-sampling process of the dataset more conducive to improving the classification quality of minority samples; second, the HIVE-COTE calculation framework was improved by combining the SBST-HESCA component, and the weight of the component algorithm was optimized, so that the unbalanced time series classification algorithm had higher voting weight to the classification result, as a result, the overall classification quality of the ensemble algorithm was further improved. The experimental part verified and analyzed the performance of IMHIVE-COTE:compared with the comparison methods, IMHIVE-COTE had the highest overall classification evaluation, and the best, the best and third overall classification evaluation on three unbalanced classification indexes. It is proved that IMHIVE-COTE's ability to solve the problem of unbalanced time series classification is significantly better.

Key words: imbalanced time series, ensemble classification algorithm, boosting, K-Nearest Neighbor (K-NN), Hierarchical Vote Collective of Transformation-Based Ensembles (HIVE-COTE)

摘要: 针对现有集成分类方法对不平衡时间序列数据学习能力欠佳的问题,采用优化组件算法性能和集成策略的思路,以异构集成方法即基于变换的集合的层次投票集合(HIVE-COTE)为基础,提出一种不平衡时间序列集成分类算法IMHIVE-COTE。该算法主要包含两个改进内容:首先,增加了一个新的不平衡分类组件SBST-HESCA,引入Boosting结合重采样的思路,并通过交叉验证预测结果来更新样本权重,从而使数据集的重采样过程更有利于提升少数类样本的分类质量;其次,结合SBST-HESCA组件对HIVE-COTE计算框架进行改进,通过优化组件算法的权重使不平衡时间序列分类算法对分类结果拥有更高的投票比重,从而再次提升集成算法整体的分类质量。实验部分对IMHIVE-COTE的性能进行了验证和分析:和对比方法相比,IMHIVE-COTE有最高的整体分类评价,并且在三个不平衡分类指标值上分别得到了最优、最优、第三优的整体分类评价,可以证明IMHIVE-COTE解决不平衡时间序列分类问题的能力明显较高。

关键词: 不平衡时间序列, 集成分类算法, 提升方法, K最近邻, 基于变换的集合的层次投票集合

CLC Number: