Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1388-1396.DOI: 10.11772/j.issn.1001-9081.2025050659

• Artificial intelligence • Previous Articles    

Multiple active learning method based on concept drift detection

Xiaobo QI1,2, Jing ZHANG1, Ying SHI1,2,3, Hui QI1,2, Hangyuan DU3()   

  1. 1.College of Computer Science and Technology,Taiyuan Normal University,Jinzhong Shanxi 030619,China
    2.Shanxi Key Laboratory for Intelligent Optimization Computing and Blockchain Technology,Taiyuan Normal University,Jinzhong Shanxi 030619,China
    3.School of Computer and Information Technology,Shanxi University,Taiyuan Shanxi 030006,China
  • Received:2025-06-16 Revised:2025-07-13 Accepted:2025-07-23 Online:2025-08-01 Published:2026-05-10
  • Contact: Hangyuan DU
  • About author:QI Xiaobo, born in 1992, Ph. D., associate professor. Her research interests include data mining, machine learning.
    ZHANG Jing, born in 1998, M. S. candidate. Her research interest include data mining, machine learning.
    SHI Ying, born in 1990, M. S., associate professor. Her research interests include bioinformatics, data mining, machine learning.
    QI Hui, born in 1981, M. S., professor. Her research interests include data mining, machine learning.
  • Supported by:
    Humanities and Social Sciences Project of Ministry of Education(24YJAZH022);Shanxi Province Basic Research Program — Free Exploration Category(202403021221193);Shanxi Provincial Patent Transformation Special Program(202302009);Project of Taiyuan Normal University Achievement Transformation and Technology Transfer Base(2023P003)

基于概念漂移检测的多重主动学习方法

祁晓博1,2, 张晶1, 史颖1,2,3, 亓慧1,2, 杜航原3()   

  1. 1.太原师范学院 计算机科学与技术学院,山西 晋中 030619
    2.太原师范学院 智能优化计算与区块链技术山西省重点实验室,山西 晋中 030619
    3.山西大学 计算机与信息技术学院,太原 030006
  • 通讯作者: 杜航原
  • 作者简介:祁晓博(1992—),女,山西太原人,副教授,博士,CCF会员,主要研究方向:数据挖掘、机器学习
    张晶(1998—),女,山东菏泽人,硕士研究生,CCF会员,主要研究方向:数据挖掘、机器学习
    史颖(1990—),女,山西太原人,副教授,硕士,主要研究方向:生物信息学、数据挖掘、机器学习
    亓慧(1981—),女,山西太原人,教授,硕士,CCF会员,主要研究方向:数据挖掘、机器学习
  • 基金资助:
    教育部人文社会科学项目(24YJAZH022);山西省基础研究计划项目(自由探索类)(202403021221193);山西省专利转化专项(202302009);山西省专利转化专项(202302012);太原师范学院成果转化与技术转移基地项目(2023P003)

Abstract:

The real-time, unboundedness, and dynamically changing characteristics of data streams lead to time-varying data distributions, a phenomenon termed concept drift. Traditional methods for detecting and adapting to concept drift typically rely on the assumption of complete label availability. However, the prohibitively high cost of data annotation in real-world scenarios makes fully supervised learning approaches infeasible. Consequently, active learning is commonly utilized for classification tasks with scarce labels. Nevertheless, in streaming environments, factors such as concept drift and single-label strategies often introduce sampling bias into active learning. To address these challenges, a Multiple Active Learning method based on Concept Drift detection (MALCD) was proposed. An online deep neural network model incorporating dynamically weighted skip connections was designed and combined with a weakly supervised drift detection method to detect concept drift. At the same time, multiple sampling strategies were incorporated to apply differentiated processing strategies across different sample regions. By integrating multiple active learning methods with concept drift detection techniques, this method can precisely select data with high uncertainty and categorical diversity while efficiently avoiding redundancy. Experimental results on eight real-world and synthetic datasets demonstrate that MALCD achieved the highest average ranking in cumulative accuracy compared to Online Ensemble Adaptive Classification (AC_OE) method, Weakly Supervised Concept Drift Detection (WSCDD) method, etc. This indicates that the MALCD can quickly learn new concept distributions after drift occurs, thereby enhancing the model's overall generalization performance.

Key words: data stream, active learning, concept drift, multiple sampling strategy, deep neural network

摘要:

数据流的实时性、无限性及动态变化特性导致数据分布具有时变性,这种随时间持续变化的现象被称为概念漂移。为检测并适应概念漂移,传统方法通常假设所有样本标签已知,但真实场景下高昂的数据标记成本使得监督学习方法代价过大,因此,主动学习方法常用于解决标签稀缺的分类任务。然而在流式环境下,概念漂移及单一标注策略等因素通常会使主动学习方法面临采样偏差。针对以上问题,提出一种基于概念漂移检测的多重主动学习方法(MALCD)。该方法设计了一种带有动态权重跳连接的在线深度神经网络模型,利用该网络模型结合弱监督漂移检测方法检测概念漂移,并融入多重采样策略,在不同样本域采用差异化策略处理。这种将多重主动学习与概念漂移检测技术相结合的方法能精准筛选不确定性高且数据类别多样的数据,高效规避冗余。在8个真实及人工数据集上的实验结果表明,MALCD的累积准确率相较于在线集成自适应分类(AC_OE)方法及弱监督概念漂移检测(WSCDD)等方法整体排名最靠前,说明该方法在漂移发生后能快速学习新概念分布,提高模型的整体泛化性能。

关键词: 数据流, 主动学习, 概念漂移, 多重采样策略, 深度神经网络

CLC Number: