Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 123-131.DOI: 10.11772/j.issn.1001-9081.2021071234
• Data science and technology • Previous Articles Next Articles
Xiaojuan LI, Meng HAN(), Le WANG, Ni ZHENG, Haodong CHENG
Received:
2021-07-15
Revised:
2021-08-30
Accepted:
2021-09-15
Online:
2021-08-30
Published:
2022-01-10
Contact:
Meng HAN
About author:
LI Xiaojuan, born in 1994, M. S. candidate. Her research interests include data stream classification.Supported by:
通讯作者:
韩萌
作者简介:
李小娟(1994—),女,宁夏吴忠人,硕士研究生,CCF会员,主要研究方向:数据流分类基金资助:
CLC Number:
Xiaojuan LI, Meng HAN, Le WANG, Ni ZHENG, Haodong CHENG. Dynamic weighted ensemble classification algorithm based on accuracy climbing[J]. Journal of Computer Applications, 2022, 42(1): 123-131.
李小娟, 韩萌, 王乐, 张妮, 程浩东. 基于准确率爬坡的动态加权集成分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 123-131.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071234
数据流 | 实例数 | function | InstanceRandomSeed | peturbFraction |
---|---|---|---|---|
Agrawal | 1 000 000 | 1 | 1 | 0.05 |
Agrawal-1 | 500 000 | 1 | 1 | 0.05 |
Agrawal-2 | 500 000 | 1 | 1 | 0.10 |
Agrawal-3 | 500 000 | 2 | 2 | 0.05 |
Tab.1 Parameters of Agrawal data streams
数据流 | 实例数 | function | InstanceRandomSeed | peturbFraction |
---|---|---|---|---|
Agrawal | 1 000 000 | 1 | 1 | 0.05 |
Agrawal-1 | 500 000 | 1 | 1 | 0.05 |
Agrawal-2 | 500 000 | 1 | 1 | 0.10 |
Agrawal-3 | 500 000 | 2 | 2 | 0.05 |
数据流 | 实例数 | Instance- RandomSeed | numAtts | Num- DriftAtts | Noise- Percentage |
---|---|---|---|---|---|
Hyperplane-1 | 500 000 | 1 | 10 | 2 | 5 |
Hyperplane-2 | 500 000 | 1 | 10 | 2 | 10 |
Hyperplane-3 | 500 000 | 1 | 10 | 10 | 10 |
Tab.2 Parameters of Hyperplane data streams
数据流 | 实例数 | Instance- RandomSeed | numAtts | Num- DriftAtts | Noise- Percentage |
---|---|---|---|---|---|
Hyperplane-1 | 500 000 | 1 | 10 | 2 | 5 |
Hyperplane-2 | 500 000 | 1 | 10 | 2 | 10 |
Hyperplane-3 | 500 000 | 1 | 10 | 10 | 10 |
算法 | 数据流 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Agrawal | Agrawal-1 | Agrawal-2 | Agrawal-3 | Hyperplane-1 | Hyperplane-2 | Hyperplane-3 | Wave | SEA | LED | RandomRBF | |
C-ECA | 94.16 | 94.64 | 88.41 | 90.63 | 91.72 | 86.14 | 91.72 | 85.38 | 89.65 | 74.09 | 95.69 |
C-DWECA | 97.24 | 97.29 | 93.88 | 94.07 | 93.27 | 89.67 | 92.52 | 91.73 | 91.07 | 80.42 | 97.44 |
ADOB | 87.54 | 93.57 | 87.78 | 80.82 | 49.92 | 50.00 | 49.92 | 33.35 | 86.98 | 9.98 | 93.73 |
BOLE | 87.69 | 92.79 | 87.12 | 77.63 | 88.87 | 83.17 | 88.87 | 82.78 | 89.54 | 74.05 | 93.93 |
OzaBoost | 89.42 | 93.42 | 89.83 | 82.92 | 90.93 | 85.51 | 90.67 | 85.15 | 89.55 | 74.06 | 92.71 |
AUE2 | 90.19 | 95.08 | 90.13 | 93.10 | 91.32 | 86.70 | 91.31 | 84.17 | 89.86 | 74.09 | 95.41 |
OAUE | 90.22 | 95.09 | 90.26 | 92.11 | 91.25 | 86.70 | 91.25 | 84.30 | 89.88 | 71.12 | 95.38 |
LevBag | 90.21 | 95.25 | 90.35 | 87.29 | 91.44 | 86.51 | 91.44 | 86.58 | 89.90 | 74.03 | 96.01 |
OCBoost | 88.52 | 93.84 | 88.34 | 74.18 | 89.62 | 84.75 | 89.62 | 55.05 | 89.43 | 16.99 | 92.71 |
OzaBag | 90.06 | 95.04 | 90.11 | 90.70 | 90.93 | 86.35 | 90.93 | 85.87 | 89.78 | 74.17 | 95.06 |
ARF | 91.95 | 96.43 | 92.19 | 93.66 | 90.71 | 86.25 | 90.71 | 86.85 | 89.96 | 74.21 | 96.40 |
ADACC | 80.78 | 85.31 | 86.09 | 63.07 | 92.39 | 77.90 | 92.39 | 77.96 | 86.49 | 57.75 | 62.97 |
LimAtt-Classifier | 94.33 | 94.37 | 88.95 | 72.18 | 78.40 | 74.03 | 78.40 | 82.80 | 78.92 | 74.04 | 72.72 |
Tab.3 Accuracy comparison among C-ECA, C-DWECA and different comparison algorithms
算法 | 数据流 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Agrawal | Agrawal-1 | Agrawal-2 | Agrawal-3 | Hyperplane-1 | Hyperplane-2 | Hyperplane-3 | Wave | SEA | LED | RandomRBF | |
C-ECA | 94.16 | 94.64 | 88.41 | 90.63 | 91.72 | 86.14 | 91.72 | 85.38 | 89.65 | 74.09 | 95.69 |
C-DWECA | 97.24 | 97.29 | 93.88 | 94.07 | 93.27 | 89.67 | 92.52 | 91.73 | 91.07 | 80.42 | 97.44 |
ADOB | 87.54 | 93.57 | 87.78 | 80.82 | 49.92 | 50.00 | 49.92 | 33.35 | 86.98 | 9.98 | 93.73 |
BOLE | 87.69 | 92.79 | 87.12 | 77.63 | 88.87 | 83.17 | 88.87 | 82.78 | 89.54 | 74.05 | 93.93 |
OzaBoost | 89.42 | 93.42 | 89.83 | 82.92 | 90.93 | 85.51 | 90.67 | 85.15 | 89.55 | 74.06 | 92.71 |
AUE2 | 90.19 | 95.08 | 90.13 | 93.10 | 91.32 | 86.70 | 91.31 | 84.17 | 89.86 | 74.09 | 95.41 |
OAUE | 90.22 | 95.09 | 90.26 | 92.11 | 91.25 | 86.70 | 91.25 | 84.30 | 89.88 | 71.12 | 95.38 |
LevBag | 90.21 | 95.25 | 90.35 | 87.29 | 91.44 | 86.51 | 91.44 | 86.58 | 89.90 | 74.03 | 96.01 |
OCBoost | 88.52 | 93.84 | 88.34 | 74.18 | 89.62 | 84.75 | 89.62 | 55.05 | 89.43 | 16.99 | 92.71 |
OzaBag | 90.06 | 95.04 | 90.11 | 90.70 | 90.93 | 86.35 | 90.93 | 85.87 | 89.78 | 74.17 | 95.06 |
ARF | 91.95 | 96.43 | 92.19 | 93.66 | 90.71 | 86.25 | 90.71 | 86.85 | 89.96 | 74.21 | 96.40 |
ADACC | 80.78 | 85.31 | 86.09 | 63.07 | 92.39 | 77.90 | 92.39 | 77.96 | 86.49 | 57.75 | 62.97 |
LimAtt-Classifier | 94.33 | 94.37 | 88.95 | 72.18 | 78.40 | 74.03 | 78.40 | 82.80 | 78.92 | 74.04 | 72.72 |
算法 | 平均准确率/% |
---|---|
C-ECA | 88.55 |
C-DWECA | 92.60 |
ADOB | 65.78 |
BOLE | 86.04 |
OzaBoost | 87.65 |
AUE2 | 89.21 |
OAUE | 88.86 |
LevBag | 89.00 |
OCBoost | 78.45 |
OzaBag | 89.00 |
ARF | 89.93 |
ADACC | 78.46 |
LimAttClassifier | 80.83 |
Tab. 4 Average accuracies of different algorithms
算法 | 平均准确率/% |
---|---|
C-ECA | 88.55 |
C-DWECA | 92.60 |
ADOB | 65.78 |
BOLE | 86.04 |
OzaBoost | 87.65 |
AUE2 | 89.21 |
OAUE | 88.86 |
LevBag | 89.00 |
OCBoost | 78.45 |
OzaBag | 89.00 |
ARF | 89.93 |
ADACC | 78.46 |
LimAttClassifier | 80.83 |
算法 | 数据流 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Agrawal | Agrawal-1 | Agrawal-2 | Agrawal-3 | Hyperplane-1 | Hyperplane-2 | Wave | SEA | LED | RandomRBF | |
C-ECA | 243.00 | 107.00 | 95.00 | 103.00 | 96.00 | 180.00 | 189 | 50.67 | 148.00 | 89.00 |
C-DWECA | 834.00 | 310.00 | 296.00 | 464.00 | 1 573.00 | 1 295.00 | 1 955 | 540.00 | 568.00 | 461.00 |
ADOB | 802.00 | 886.00 | 933.00 | 390.00 | 1 656.00 | 1 597.00 | 2 362 | 280.00 | 2 756.00 | 508.00 |
BOLE | 86.00 | 46.24 | 47.94 | 53.16 | 51.23 | 52.36 | 153 | 15.35 | 85.00 | 43.11 |
OzaBoost | 57.39 | 23.19 | 22.85 | 24.75 | 35.34 | 36.56 | 75 | 14.77 | 35.09 | 41.60 |
AUE2 | 62.00 | 23.88 | 30.44 | 42.89 | 61.00 | 61.00 | 111 | 19.76 | 59.39 | 55.03 |
OAUE | 74.00 | 27.97 | 35.31 | 57.80 | 62.00 | 62.00 | 111 | 22.84 | 55.71 | 59.55 |
LevBag | 113.00 | 45.74 | 56.07 | 69.00 | 102.00 | 100.00 | 141 | 36.18 | 70.00 | 82.00 |
OCBoost | 60.00 | 27.06 | 28.20 | 32.86 | 40.93 | 40.83 | 88 | 18.79 | 49.31 | 42.31 |
OzaBag | 34.11 | 15.38 | 17.56 | 22.50 | 34.50 | 34.73 | 66 | 11.97 | 31.35 | 33.52 |
ARF | 185.00 | 96.00 | 104.00 | 109.00 | 109.00 | 106.00 | 123 | 73.00 | 65.00 | 83.00 |
ADACC | 60.00 | 31.34 | 23.29 | 20.39 | 51.12 | 50.60 | 90 | 13.65 | 51.19 | 29.62 |
LimAttClassifier | 48.78 | 19.06 | 18.74 | 21.77 | 35.60 | 36.54 | 92 | 9.61 | 51.49 | 36.85 |
Tab.5 Comparison of time efficiency between C-ECA, C-DWECA algorithms and different comparison algorithms
算法 | 数据流 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Agrawal | Agrawal-1 | Agrawal-2 | Agrawal-3 | Hyperplane-1 | Hyperplane-2 | Wave | SEA | LED | RandomRBF | |
C-ECA | 243.00 | 107.00 | 95.00 | 103.00 | 96.00 | 180.00 | 189 | 50.67 | 148.00 | 89.00 |
C-DWECA | 834.00 | 310.00 | 296.00 | 464.00 | 1 573.00 | 1 295.00 | 1 955 | 540.00 | 568.00 | 461.00 |
ADOB | 802.00 | 886.00 | 933.00 | 390.00 | 1 656.00 | 1 597.00 | 2 362 | 280.00 | 2 756.00 | 508.00 |
BOLE | 86.00 | 46.24 | 47.94 | 53.16 | 51.23 | 52.36 | 153 | 15.35 | 85.00 | 43.11 |
OzaBoost | 57.39 | 23.19 | 22.85 | 24.75 | 35.34 | 36.56 | 75 | 14.77 | 35.09 | 41.60 |
AUE2 | 62.00 | 23.88 | 30.44 | 42.89 | 61.00 | 61.00 | 111 | 19.76 | 59.39 | 55.03 |
OAUE | 74.00 | 27.97 | 35.31 | 57.80 | 62.00 | 62.00 | 111 | 22.84 | 55.71 | 59.55 |
LevBag | 113.00 | 45.74 | 56.07 | 69.00 | 102.00 | 100.00 | 141 | 36.18 | 70.00 | 82.00 |
OCBoost | 60.00 | 27.06 | 28.20 | 32.86 | 40.93 | 40.83 | 88 | 18.79 | 49.31 | 42.31 |
OzaBag | 34.11 | 15.38 | 17.56 | 22.50 | 34.50 | 34.73 | 66 | 11.97 | 31.35 | 33.52 |
ARF | 185.00 | 96.00 | 104.00 | 109.00 | 109.00 | 106.00 | 123 | 73.00 | 65.00 | 83.00 |
ADACC | 60.00 | 31.34 | 23.29 | 20.39 | 51.12 | 50.60 | 90 | 13.65 | 51.19 | 29.62 |
LimAttClassifier | 48.78 | 19.06 | 18.74 | 21.77 | 35.60 | 36.54 | 92 | 9.61 | 51.49 | 36.85 |
1 | 李小娟,韩萌,王乐,等. 监督与半监督学习下的数据流集成分类综述[J]. 计算机应用研究, 2021, 38(7):1921-1929. |
LI X J, HAN M, WANG L, et al. Overview of data stream ensemble classification under supervised and semi-supervised learning[J]. Application Research of Computers, 2021, 38(7):1921-1929. | |
2 | 杨剑锋,乔佩蕊,李永梅,等. 机器学习分类问题及算法研究综述[J]. 统计与决策, 2019, 35(6): 36-40. 10.13546/j.cnki.tjyjc.2019.06.008 |
YANG J F, QIAO P R, LI Y M, et al. A review of machine learning classification problems and algorithms[J]. Statistics and Decision, 2019, 35(6): 36-40. 10.13546/j.cnki.tjyjc.2019.06.008 | |
3 | GOMES H M, BARDDAL J P, ENEMBRECK F, et al. A survey on ensemble learning for data stream classification[J]. ACM Computing Surveys, 2017, 50(2): No.23. 10.1145/3054925 |
4 | DONG X B, YU Z W, CAO W M, et al. A survey on ensemble learning[J]. Frontiers of Computer Science, 2020, 14(2):241-258. 10.1007/s11704-019-8208-z |
5 | BIFET A, HOLMES G, PFAHRINGER B. Leveraging bagging for evolving data streams [C]// Proceeding of the 2010 Joint European Conference on Machine Learning and Knowledge Discovery in Databases, LNCS6321. Berlin: Springer, 2010:135-150. |
6 | KHAN A J, RAZA B, SHAHID A R, et al. Handling incomplete data classification using Imputed Feature selected Bagging (IFBag) method [J]. Intelligent Data Analysis, 2021, 25(4): 825-846. 10.3233/ida-205331 |
7 | SARNOVSKY M, MARCINKO J. Adaptive bagging methods for classification of data streams with concept drift[J]. Acta Polytechnica Hungarica, 2021, 18(3):47-63. 10.12700/aph.18.3.2021.3.3 |
8 | 章恒,鞠时光.基于概念漂移检测的网络数据流分类[J].计算机与现代化,2021(7):107-114. 10.3969/j.issn.1006-2475.2021.07.019 |
ZHANG H, JU S-G. Network data stream classification based on concept drift detection[J]. Computer and Modernization, 2021(7): 107-114. 10.3969/j.issn.1006-2475.2021.07.019 | |
9 | SHASTRI S, MANSOTRA V, KOUR P, et al. Voting-Boosting: a novel machine learning ensemble for the prediction of Infants' Data[J]. Indian Journal of Science and Technology, 2020, 13(22): 2189-2202. 10.17485/ijst/v13i22.468 |
10 | KHINE A A, KHIN H W. Credit card fraud detection using online boosting with extremely fast decision tree[C]// Proceedings of 2020 IEEE Conference on Computer Applications. Piscataway: IEEE, 2020: 1-4. 10.1109/icca49400.2020.9022843 |
11 | MASTELINI S M, SANTANA E J, CERRI R, et al. DSTARS: a multi-target deep structure for tracking asynchronous regressor stacking [J]. Applied Soft Computing, 2020, 91: No.106215. 10.1016/j.asoc.2020.106215 |
12 | ORTIZ-DÍAZ A A, BALDO F, MARIÑO L M P, et al. FASE-AL — adaptation of fast adaptive stacking of ensembles for supporting active learning[EB/OL]. (2020-01-30) [2021-05-22].. |
13 | JIANG W L, CHEN Z H, XIANG Y, et al. SSEM: a novel self-adaptive stacking ensemble model for classification[J]. IEEE Access, 2019, 7: 120337-120349. 10.1109/access.2019.2933262 |
14 | EFENDI E, DULEK B. Online EM-based ensemble classification with correlated agents [J]. IEEE Signal Processing Letters, 2021, 28: 294-298. 10.1109/lsp.2021.3052135 |
15 | AGUSTIN, ORTIZ, DIAZ, et al. Fast adapting ensemble: a new algorithm for mining data streams with concept drift[J]. The Scientific World Journal, 2015, 2015: No.235810. 10.1155/2015/235810 |
16 | BRZEZINSKI D, STEFANOWSKI J. Combining block-based and online methods in learning ensembles from concept drifting data streams [J]. Information Sciences, 2014, 265:50-67. 10.1016/j.ins.2013.12.011 |
17 | SHAO J M, HUANG F, YANG Q L, et al. Robust prototype-based learning on data streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(5): 978-991. 10.1109/tkde.2017.2772239 |
18 | SONG G, YE Y M, ZHANG H J, et al. Dynamic clustering forest: an ensemble framework to efficiently classify textual data stream with concept drift [J]. Information Sciences, 2016, 357: 125-143. 10.1016/j.ins.2016.03.043 |
19 | SANTOS S G T C , GONÇALVES JÚNIOR P M, SILVA G D S, et al. Speeding up recovery from concept drifts [C]// Proceedings of the 2014 Joint European Conference on Machine Learning and Knowledge Discovery in Databases, LNCS8726. Berlin: Springer, 2014:179-194. |
20 | BERTINI J R, NICOLETTI M D C. An iterative boosting-based ensemble for streaming data classification [J]. Information Fusion, 2019, 45: 66-78. 10.1016/j.inffus.2018.01.003 |
21 | BAIDARI I, HONNIKOLL N. Accuracy weighted diversity-based online boosting[J]. Expert Systems with Applications, 2020, 160: No.113723. 10.1016/j.eswa.2020.113723 |
22 | BIFET A, HOLMES G, PFAHRINGER B, et al. MOA: massive online analysis, a framework for stream classification and clustering [C]// Proceedings of the 2010 1st Workshop on Applications of Pattern Analysis. New York: JMLR.org, 2010: 44-50. 10.1109/icdmw.2010.17 |
[1] | Yuxin HUANG, Jialong XU, Zhengtao YU, Shukai HOU, Jiaqi ZHOU. Unsupervised text sentiment transfer method based on generation prompt [J]. Journal of Computer Applications, 2024, 44(9): 2667-2673. |
[2] | Chun SUN, Chunlong HU, Shucheng HUANG. Consistency preserving age estimation method by ensemble ranking [J]. Journal of Computer Applications, 2024, 44(8): 2381-2386. |
[3] | Qiangkui LENG, Xuezi SUN, Xiangfu MENG. Oversampling method for imbalanced data based on sample potential and noise evolution [J]. Journal of Computer Applications, 2024, 44(8): 2466-2475. |
[4] | Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482. |
[5] | Dongwei WANG, Baichen LIU, Zhi HAN, Yanmei WANG, Yandong TANG. Deep network compression method based on low-rank decomposition and vector quantization [J]. Journal of Computer Applications, 2024, 44(7): 1987-1994. |
[6] | Junchi GE, Weihua ZHAO. Distance weighted discriminant analysis based on robust principal component analysis for matrix data [J]. Journal of Computer Applications, 2024, 44(7): 2073-2079. |
[7] | Qianhui LU, Yu ZHANG, Mengling WANG, Tingwei WU, Yuzhong SHAN. Classification model of nuclear power equipment quality text based on improved recurrent pooling network [J]. Journal of Computer Applications, 2024, 44(7): 2034-2040. |
[8] | Shibin LI, Jun GONG, Shengjun TANG. Semi-supervised heterophilic graph representation learning model based on Graph Transformer [J]. Journal of Computer Applications, 2024, 44(6): 1816-1823. |
[9] | Xinyan YU, Cheng ZENG, Qian WANG, Peng HE, Xiaoyu DING. Few-shot news topic classification method based on knowledge enhancement and prompt learning [J]. Journal of Computer Applications, 2024, 44(6): 1767-1774. |
[10] | Xu LI, Yulin HE, Laizhong CUI, Zhexue HUANG, Fournier‑Viger PHILIPPE. Distributed observation point classifier for big data with random sample partition [J]. Journal of Computer Applications, 2024, 44(6): 1727-1733. |
[11] | Feiyu ZHAI, Handa MA. Hybrid classical-quantum classification model based on DenseNet [J]. Journal of Computer Applications, 2024, 44(6): 1905-1910. |
[12] | Xun YAO, Zhongzheng QIN, Jie YANG. Generative label adversarial text classification model [J]. Journal of Computer Applications, 2024, 44(6): 1781-1785. |
[13] | Zixuan YUAN, Xiaoqing WENG, Ningzhen GE. Early classification model of multivariate time series based on orthogonal locality preserving projection and cost optimization [J]. Journal of Computer Applications, 2024, 44(6): 1832-1841. |
[14] | Xin LI, Qiao MENG, Junyi HUANGFU, Lingchen MENG. YOLOv5 multi-attribute classification based on separable label collaborative learning [J]. Journal of Computer Applications, 2024, 44(5): 1619-1628. |
[15] | Wenshuo GAO, Xiaoyun CHEN. Point cloud classification network based on node structure [J]. Journal of Computer Applications, 2024, 44(5): 1471-1478. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||