Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 123-131.DOI: 10.11772/j.issn.1001-9081.2021071234

• Data science and technology • Previous Articles    

Dynamic weighted ensemble classification algorithm based on accuracy climbing

Xiaojuan LI, Meng HAN(), Le WANG, Ni ZHENG, Haodong CHENG   

  1. School of Computer Science and Engineering,North Minzu University,Yinchuan Ningxia 750021,China
  • Received:2021-07-15 Revised:2021-08-30 Accepted:2021-09-15 Online:2021-08-30 Published:2022-01-10
  • Contact: Meng HAN
  • About author:LI Xiaojuan, born in 1994, M. S. candidate. Her research interests include data stream classification.
    HAN Meng, born in 1982, Ph. D., associate professor. Her research interests include data mining.
    WANG Le, born in 1994, M. S. candidate. Her research interests include data stream classification.
    CHENG Haodong, born in 1996, M. S. candidate. His research interests include pattern mining.
    First author contact:ZHANG Ni, born in 1996, M. S. candidate. Her research interests include pattern mining.
  • Supported by:
    National Natural Science Foundation of China(62062004);Ningxia Natural Science Foundation(2020AAC03216)

基于准确率爬坡的动态加权集成分类算法

李小娟, 韩萌(), 王乐, 张妮, 程浩东   

  1. 北方民族大学 计算机科学与工程学院,银川 750021
  • 通讯作者: 韩萌
  • 作者简介:李小娟(1994—),女,宁夏吴忠人,硕士研究生,CCF会员,主要研究方向:数据流分类
    韩萌(1982—),女,河南商丘人,副教授,博士,CCF会员,主要研究方向:数据挖掘
    王乐(1994—),女,吉林白城人,硕士研究生,CCF会员,主要研究方向:数据流分类
    张妮(1996—),女,山西长治人,硕士研究生,CCF会员,主要研究方向:模式挖掘
    程浩东(1996—),男,山东泰安人,硕士研究生,CCF会员,主要研究方向:模式挖掘。
  • 基金资助:
    国家自然科学基金资助项目(62062004);宁夏自然科学基金资助项目(2020AAC03216)

Abstract:

In the traditional ensemble classification algorithm, the ensemble number is generally set to a fixed value, which may lead to a low classification accuracy. Aiming at this problem, an accuracy Climbing Ensemble Classification Algorithm (C-ECA) was proposed. Firstly, the base classifiers was no longer replaced the same number of base classifiers with the worst performance, but updated based on the accuracy in this algorithm, and then the optimal ensemble number was determined. Secondly, on the basis of C-ECA, a Dynamic Weighted Ensemble Classification Algorithm based on Climbing (C-DWECA) was proposed. When the base classifier was trained on the data stream with different features, the best weight of the base classifier was able to be obtained by a weighting function proposed in this algorithm, thereby improving the performance of the ensemble classifier. Finally, in order to detect the concept drift earlier and improve the final accuracy, Fast Hoffding Drift Detection Method (FHDDM) was adopted. Experimental results show that the accuracy of C-DWECA can reach up to 97.44%, and the average accuracy of the proposed algorithm is about 40% higher than that of Adaptable Diversity-based Online Boosting (ADOB) algorithm, and is also better than those of other comparison algorithms such as Leveraging Bagging (LevBag) and Adaptive Random Forest (ARF).

Key words: ensemble learning, classification, data stream, dynamic weighting, ensemble number, accuracy, climbing

摘要:

传统集成分类算法中,一般将集成数目设置为固定值,这可能会导致较低分类准确率。针对这一问题,提出了准确率爬坡集成分类算法(C-ECA)。首先,该算法不再用一些基分类器去替换相同数量的表现最差的基分类器,而是基于准确率对基分类器进行更新,然后确定最佳集成数目。其次,在C-ECA的基础上提出了基于爬坡的动态加权集成分类算法(C-DWECA)。该算法提出了一个加权函数,其在具有不同特征的数据流上训练基分类器时,可以获得基分类器的最佳权值,从而提升集成分类器的性能。最后,为了能更早地检测到概念漂移并提高最终精度,采用了快速霍夫丁漂移检测方法(FHDDM)。实验结果表明C-DWECA的准确率最高可达到97.44%,并且该算法的平均准确率比自适应多样性的在线增强(ADOB)算法提升了40%左右,也优于杠杆装袋(LevBag)、自适应随机森林(ARF)等其他对比算法。

关键词: 集成学习, 分类, 数据流, 动态加权, 集成数目, 准确率, 爬坡

CLC Number: