《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (3): 776-784.DOI: 10.11772/j.issn.1001-9081.2022020231

• 数据科学与技术 • 上一篇    

分段加权的概念漂移检测方法

陈志强, 韩萌(), 武红鑫, 李慕航, 张喜龙   

  1. 北方民族大学 计算机科学与工程学院,银川 750021
  • 收稿日期:2022-03-02 修回日期:2022-05-25 接受日期:2022-05-25 发布日期:2022-08-16 出版日期:2023-03-10
  • 通讯作者: 韩萌
  • 作者简介:陈志强(1998—),男,江苏扬州人,硕士研究生,CCF会员,主要研究方向:数据挖掘、数据流分类
    韩萌(1982—),女,河南商丘人,教授,博士,CCF会员,主要研究方向:数据挖掘
    武红鑫(1998—),女,山西太原人,硕士研究生,主要研究方向:数据流分类
    李慕航(1997—),男,山西太原人,硕士研究生,CCF会员,主要研究方向:模式挖掘
    张喜龙(1996—),男,宁夏银川人,硕士研究生,CCF会员,主要研究方向:数据流分类。
  • 基金资助:
    国家自然科学基金资助项目(62062004);宁夏自然科学基金资助项目(2022AAC03279)

Multi-stage weighted concept drift detection method

Zhiqiang CHEN, Meng HAN(), Hongxin WU, Muhang LI, Xilong ZHANG   

  1. School of Computer Science and Engineering,North Minzu University,Yinchuan Ningxia 750021,China
  • Received:2022-03-02 Revised:2022-05-25 Accepted:2022-05-25 Online:2022-08-16 Published:2023-03-10
  • Contact: Meng HAN
  • About author:CHEN Zhiqiang, born in 1998, M. S. candidate. His research interests include data mining, data stream classification.
    WU Hongxin, born in 1998, M. S. candidate. Her research interests include data stream classification.
    LI Muhang, born in 1997, M. S. candidate. His research interests include pattern mining.
    ZHANG Xilong, born in 1996, M. S. candidate. His research interests include data stream classification.
  • Supported by:
    National Natural Science Foundation of China(62062004);Natural Science Foundation of Ningxia(2022AAC03279)

摘要:

针对现有漂移检测方法无法平衡检测延迟、误报与漏报以及时空效率等问题,提出一个新的阶段转换阈值参数,在概念漂移检测中引入包含“稳定阶段-警告阶段-漂移阶段”的分段加权机制,对实例分阶段地赋予权重,并将该机制应用在双层滑动窗口中;然后基于Hoeffding不等式提出一种分段加权的概念漂移检测方法(MSDDM)。在人工数据集上,相较于FHDDM、HDDM等漂移检测方法,MSDDM能够更快地检测出突变与渐变概念漂移,同时又能保持较低的误检率与漏检率;在真实数据集上,MSDDM相较于其他方法在大部分情况下都具有最高的分类准确率。实验结果表明,MSDDM能够以较高的漂移检测性能和较优的时空效率检测出数据流中的概念漂移。

关键词: 数据流, 概念漂移, 漂移检测, 滑动窗口, 分段加权机制

Abstract:

Aiming at the problem of the existing drift detection methods in balancing the detection delay, false positives, false negatives, and spatiotemporal efficiency, a new stage transition threshold parameter was proposed, and a multi-stage weighting mechanism including “stable stage-warning stage-drift stage” was introduced in the concept drift detection to weight the instances in stages, and the mechanism was applied to the double sliding window. Then a Multi-Stage weighted Drift Detection Method (MSDDM) based on Hoeffding inequality was proposed. On artificial datasets, MSDDM detected abrupt and gradual concept drift faster than Fast Hoeffding Drift Detection Method (FHDDM), Drift Detection Method based on Hoeffding’s bound (HDDM) and other drift detection methods, while maintained a low false detection rate and a false alarm rate. At the same time, MSDDM had the highest classification accuracy in most cases compared with other methods on real-world datasets. Experimental results show that MSDDM can detect concept drift in data streams with high drift detection performance and great spatiotemporal efficiency.

Key words: data stream, concept drift, drift detection, sliding window, multi-stage weighting mechanism

中图分类号: