Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (6): 1664-1675.DOI: 10.11772/j.issn.1001-9081.2022060881

• The 37 CCF National Conference of Computer Applications (CCF NCCA 2022) • Previous Articles     Next Articles

Overview of classification methods for complex data streams with concept drift

Dongliang MU, Meng HAN(), Ang LI, Shujuan LIU, Zhihui GAO   

  1. School of Computer Science and Engineering,North Minzu University,Yinchuan Ningxia 750021,China
  • Received:2022-06-20 Revised:2022-08-09 Accepted:2022-08-12 Online:2022-08-24 Published:2023-06-10
  • Contact: Meng HAN
  • About author:MU Dong
    liang, born in 1998, M. S. candidate. His research interests include big data mining.
    LI Ang, born in 1999, M. S. candidate. His research interests include big data mining.
    LIU Shujuan, born in 1998, M. S. candidate. Her research interests include big data mining.
    GAO Zhihui, born in 1996, M. S. candidate. Her research interests include big data mining.
  • Supported by:
    National Natural Science Foundation of China(62062004);Natural Science Foundation of Ningxia(2020AAC03216)

概念漂移复杂数据流分类方法综述

穆栋梁, 韩萌(), 李昂, 刘淑娟, 高智慧   

  1. 北方民族大学 计算机科学与工程学院,银川 750021
  • 通讯作者: 韩萌
  • 作者简介:穆栋梁(1998—),男,山西大同人,硕士研究生,CCF会员,主要研究方向:大数据挖掘
    韩萌(1982—),女,河南商丘人,教授,博士,CCF会员,主要研究方向:大数据挖掘Email:2003051@nmu.edu.cn
    李昂(1999—),男,河南洛阳人,硕士研究生,主要研究方向:大数据挖掘
    刘淑娟(1998—),女,河南新乡人,硕士研究生,主要研究方向:大数据挖掘
    高智慧(1996—),女,山东临沂人,硕士研究生,主要研究方向:大数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(62062004);宁夏自然科学基金资助项目(2020AAC03216)

Abstract:

The traditional classifiers are difficult to cope with the challenges of complex types of data streams with concept drift, and the obtained classification results are often unsatisfactory. Aiming at the methods of dealing with concept drift in different types of data streams, classification methods for complex data streams with concept drift were summarized from four aspects: imbalance, concept evolution, multi-label and noise-containing. Firstly, classification methods of four aspects were introduced and analyzed: block-based and online-based learning approaches for classifying imbalanced concept drift data streams, clustering-based and model-based learning approaches for classifying concept evolution concept drift data streams, problem transformation-based and algorithm adaptation-based learning approaches for classifying multi-label concept drift data streams and noisy concept drift data streams. Then, the experimental results and performance metrics of the mentioned concept drift complex data stream classification methods were compared and analyzed in detail. Finally, the shortcomings of the existing methods and the next research directions were given.

Key words: data stream classification, complex data stream, concept drift, imbalanced data stream, conceptual evolution

摘要:

传统分类器难以应对含概念漂移的复杂类型数据流分类这一难题,且得到的分类效果往往不尽如人意。针对不同类型数据流中处理概念漂移的方法,从不平衡、概念演化、多标签和含噪声4个方面对概念漂移复杂数据流分类方法进行了综述。首先,对基于块的和基于在线的学习方式对不平衡概念漂移数据流、基于聚类和基于模型的学习方式对概念演化概念漂移数据流、基于问题转换和基于算法适应的学习方式对多标签概念漂移数据流和含噪声概念漂移数据流这四个方面的分类方法进行了分析介绍;然后,对所提到概念漂移复杂数据流分类方法的实验结果及性能指标进行了详细的对比和分析;最后,给出了现有方法的不足和下一步研究方向。

关键词: 数据流分类, 复杂数据流, 概念漂移, 不平衡数据流, 概念演化

CLC Number: