Improved federated weighted average algorithm

doi:10.11772/j.issn.1001-9081.2021071264

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (4): 1131-1136.DOI: 10.11772/j.issn.1001-9081.2021071264

• The 36 CCF National Conference of Computer Applications (CCF NCCA 2020) • Previous Articles

Improved federated weighted average algorithm

Changyin LUO¹^,²^,³, Junyu WANG¹^,²^,³, Xuebin CHEN¹^,²^,³(), Chundi MA¹, Shufen ZHANG¹^,²^,³

^1.College of Science，North China University of Science and Technology，Tangshan Hebei 063210，China
^2.Hebei Province Key Laboratory of Data Science and Application （North China University of Science and Technology），Tangshan Hebei 063210，China
^3.Tangshan Data Science Laboratory （North China University of Science and Technology），Tangshan Hebei 063210，China

Received:2021-07-16 Revised:2021-10-13 Accepted:2021-10-18 Online:2021-10-13 Published:2022-04-10
Contact: Xuebin CHEN
About author:LUO Changyin， born in 1994， M. S. candidate. His research interests include data security.
WANG Junyu， born in 1996， M. S. candidate. Her research interests include data security.
MA Chundi， born in 1999. His research interests include network security.
ZHANG Shufen， born in 1972， Ph. D.， professor. Her research interests include data security.
First author contact:CHEN Xuebin， born in 1970， Ph. D.， professor. His research interests include data security， IoT security， network security.
Supported by:
National Natural Science Foundation of China(U20A20179);Tangshan Science and Technology Project(18120203A)

改进的联邦加权平均算法

罗长银¹^,²^,³, 王君宇¹^,²^,³, 陈学斌¹^,²^,³(), 马春地¹, 张淑芬¹^,²^,³

^1.华北理工大学理学院，河北唐山 063210
^2.河北省数据科学与应用重点实验室（华北理工大学），河北唐山 063210
^3.唐山市数据科学重点实验室（华北理工大学），河北唐山 063210

通讯作者: 陈学斌
作者简介:罗长银（1994—），男，陕西安康人，硕士研究生，CCF会员，主要研究方向：数据安全
王君宇（1996—），女，河北唐山人，硕士研究生，主要研究方向：数据安全
马春地（1999—），男，河北唐山人，主要研究方向：网络安全
张淑芬（1972—），女，河北唐山人，教授，博士，CCF高级会员，主要研究方向：数据安全。
基金资助:
国家自然科学基金资助项目(U20A20179);唐山市科技厅项目(18120203A)

Abstract

Abstract:

Aiming at the problem that the improved federated average algorithm based on analytic hierarchy process was affected by subjective factors when calculating its data quality， an improved federated weighted average algorithm was proposed to process multi-source data from the perspective of data quality. Firstly， the training samples were divided into pre-training samples and pre-testing samples. Then， the accuracy of the initial global model on the pre-training data was used as the quality weight of the data source. Finally， the quality weight was introduced into the federated average algorithm to reupdate the weights in the global model. The simulation results show that the model trained by the improved federal weighted average algorithm get the higher accuracy compared with the model trained by the traditional federal average algorithm， which is improved by 1.59% and 1.24% respectively on equally divided and unequally divided datasets. At the same time， compared with the traditional multi-party data retraining method， although the accuracy of the proposed model is slightly reduced， the security of data and model is improved.

Key words: federated learning, Federated Average (FedAvg), federated weighted average algorithm, multi-source data, data quality

摘要：

针对基于层次分析改进的联邦平均算法在计算其数据质量时存在主观因素的影响，提出改进的联邦加权平均算法，从数据质量的角度来处理多源数据。首先，将训练样本划分为预训练样本与预测试样本；然后，使用初始全局模型在预训练数据上的精度作为该数据源的质量权重；最后，将质量权重引入到联邦平均算法中，重新进行全局模型中权重更新。仿真结果表明，在均等分割的数据集与非均等分割的数据集上，改进的联邦加权平均算法训练的模型与传统联邦平均算法训练的模型相比，准确率最高分别提升了1.59%和1.24%；改进的联邦加权平均算法训练的模型与传统整合多方数据再训练的模型相比，虽然准确率略有下降，但数据与模型的安全性有所提升。

关键词: 联邦学习, 联邦平均, 联邦加权平均算法, 多源数据, 数据质量

CLC Number:

TP391

Changyin LUO, Junyu WANG, Xuebin CHEN, Chundi MA, Shufen ZHANG. Improved federated weighted average algorithm[J]. Journal of Computer Applications, 2022, 42(4): 1131-1136.

罗长银, 王君宇, 陈学斌, 马春地, 张淑芬. 改进的联邦加权平均算法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1131-1136.

Figures/Tables 6

References 20

1	朱骁，杨庚. 横向联邦学习中PCA差分隐私数据发布算法［J］. 计算机应用研究，2022，39（1）：236-239，248.
	ZHU X， YANG G. PCA differential privacy data publishing algorithm［J］. Application Research Of Computers，2022，39（1）：236-239，248.
2	LI W K， LU S R， DENG D L. Quantum federated learning through blind quantum computing［J］. Science China Physics， Mechanics & Astronomy， 2021， 64： 100312. 10.1007/s11433-021-1753-3
3	XU， J L， LIN J， LIANG W， et al. Privacy preserving personalized blockchain reliability prediction via federated learning in IoT environments［J］. Cluster Computing， 2021， 9（2）： 1-12. 10.1007/s10586-021-03399-w
4	XING J， TIAN J D， JIANG Z X， et al. Jupiter： a modern federated learning platform for regional medical care［J］. Science China Information Sciences， 2021， 64（10）：202101：1-202101：14. 10.1007/s11432-020-3062-8
5	刘俊旭，孟小峰.机器学习的隐私保护研究综述［J］.计算机研究与发展，2020，57（2）：346-362. 10.7544/issn1000-1239.2020.20190455
	LIU J X， MENG X F. Review on privacy protection of machine learning ［J］. Journal of Computer Research and Development， 2020，57 （2）： 346-362. 10.7544/issn1000-1239.2020.20190455
6	POLAP D， WOZNIAK M. Meta-heuristic as manager in federated learning approaches for image processing purposes ［J］. Applied Soft Computing， 2021，113（PART A）： 107872. 10.1016/j.asoc.2021.107872
7	ZHU H Y， XU J J， LIU S Q， et al. Federated learning on non-IID data： a survey［J］. Neurocomputing， 2021， 465：371-390. 10.1016/j.neucom.2021.07.098
8	LIU J C， WANG J H， RONG C H， et al. FedPA： an adaptively partial model aggregation strategy in federated learning［J］. Computer Networks， 2021， 199： 108468. 10.1016/j.comnet.2021.108468
9	罗长银，陈学斌，马春地，等. 面向区块链的在线联邦增量学习算法［J］. 计算机应用， 2021， 41（2）：363-371. 10.11772/j.issn.1001-9081.2020050609
	LUO C Y， CHEN X B， MA C D， et al. Online federated incremental learning algorithm for blockchain［J］. Journal of Computer Applications， 2021， 41（2）：363-371. 10.11772/j.issn.1001-9081.2020050609
10	CHEN Y， SUN X Y， JIN Y C. Communication-efficient federated deep learning with layerwise asynchronous model update and temporally weighted aggregation［J］. IEEE Transactions on Neural Networks and Learning Systems， 2019， 31（10）：4229-4238. 10.1109/tnnls.2019.2953131
11	BLANCO-JUSTICIA A， DOMINGO-FERRER J， MARTÍNEZ S， et al. Achieving security and privacy in federated learning systems： survey， research challenges and future directions［J］. Engineering Applications of Artificial Intelligence， 2021， 106： 104468. 10.1016/j.engappai.2021.104468
12	杨强.联邦学习：人工智能的最后一公里［J］.智能系统学报，2020，15（1）：183-186. 10.11992/tis.202005036
	YANG Q. Federated learning： the last mile of artificial intelligence ［J］ CAAI Transactions on Intelligent Systems， 2020，15 （1）： 183-186. 10.11992/tis.202005036
13	郭方方，潮洛蒙，朱建文. 基于相似连接的多源数据并行预处理方法［J］. 计算机应用， 2019， 39（1）：57-60. 10.11772/j.issn.1001-9081.2018071869
	GUO F F， CHAO L M， ZHU J W. Multi-source data parallel preprocessing method based on similar connection［J］.Journal of Computer Applications， 2019， 39（1）：57-60. 10.11772/j.issn.1001-9081.2018071869
14	GAO H H， HUANG W Q， DUAN Y C， et al. Research on cost-driven services composition in an uncertain environment［J］. Journal of Internet Technology， 2019， 20（3）：755-769.
15	MA X， WEN C L. An asynchronous quasi-cloud/edge/client collaborative federated learning mechanism for fault diagnosis［J］. Chinese Journal of Electronics， 2021， 30（5）：969-977. 10.1049/cje.2021.07.008
16	ZHOU Y H， QING Y， LV J C. Communication-efficient federated learning with compensated overlap-FedAvg［J］. IEEE Transactions On Neural Networks and Learning Systems， 2020， 33 （1）：192-205.
17	罗长银，陈学斌，马春地，等. 基于层析分析改进的联邦平均算法［J］. 计算机科学， 2021， 48（8）：32-40. 10.11896/jsjkx.201000093
	LUO C Y， CHEN X B， MA C D， et al. Improved federated average algorithm based on tomographic analysis［J］. Computer Science， 2021， 48（8）：32-40. 10.11896/jsjkx.201000093
18	KANG D， AHN C W. Communication cost reduction with partial structure in federated learning［J］. Electronics， 2021， 10（17）：2081. 10.3390/electronics10172081
19	LIU J C， XU H L， XU Y， et al. Communication-efficient asynchronous federated learning in resource-constrained edge computing［J］. Networks， 2021， 199（9）： 108429. 10.1016/j.comnet.2021.108429
20	XU J J， DU W L， XU Q Y， et al. Federated learning based atmospheric source term estimation in urban environments［J］. Computers & Chemical Engineering. 2021， 155： 107505. 10.1016/j.compchemeng.2021.107505

数据集	样本数	样本维度	类别数
digits	5 620	64	10
recognition	20 000	16	26
segment	2 310	19	7
segmentation	2 310	19	7
telescope	19 020	10	2

数据集	样本数	样本维度	类别数
digits	5 620	64	10
recognition	20 000	16	26
segment	2 310	19	7
segmentation	2 310	19	7
telescope	19 020	10	2

数据集	初始全局模型	k=1		k=2		k=3
数据集	初始全局模型	准确率	方差/10^-5	准确率	方差/10^-5	准确率	方差/10^-5
digits	随机森林	0.962 8	7.25	0.962 3	6.65	0.962 7	9.38
	朴素贝叶斯	0.803 8	167.22	0.799 4	172.74	0.799 3	175.84
	神经网络	0.956 5	11.50	0.959 2	10.53	0.958 2	9.15
	决策树	0.826 7	42.74	0.827 2	40.16	0.827 1	44.32
recognition	随机森林	0.901 3	6.41	0.901 8	5.42	0.901 4	9.27
	朴素贝叶斯	0.633 2	21.78	0.634 0	19.37	0.634 2	20.54
	神经网络	0.839 9	21.01	0.841 0	11.21	0.841 8	12.18
	决策树	0.764 7	21.44	0.766 6	20.15	0.767 6	15.80
segment	随机森林	0.948 2	22.82	0.948 7	30.67	0.953 3	42.52
	朴素贝叶斯	0.793 0	99.68	0.794 2	110.83	0.785 1	120.75
	神经网络	0.779 7	303.81	0.781 8	294.75	0.788 6	388.84
	决策树	0.925 1	40.52	0.923 7	45.45	0.920 2	54.10
segmentation	随机森林	0.950 5	29.26	0.948 0	31.37	0.951 1	25.40
	朴素贝叶斯	0.792 2	110.21	0.793 7	127.75	0.788 9	165.34
	神经网络	0.782 0	297.72	0.770 5	452.76	0.789 0	264.45
	决策树	0.918 8	53.95	0.926 0	41.33	0.920 5	64.32
telescope	随机森林	0.866 0	8.34	0.865 4	5.67	0.865 2	6.49
	朴素贝叶斯	0.728 1	14.81	0.726 0	17.78	0.727 5	16.10
	神经网络	0.812 2	14.83	0.809 5	19.12	0.811 7	18.19
	决策树	0.800 2	9.11	0.799 7	9.95	0.802 8	12.12

数据集	初始全局模型	k=1		k=2		k=3
数据集	初始全局模型	准确率	方差/10^-5	准确率	方差/10^-5	准确率	方差/10^-5
digits	随机森林	0.962 8	7.25	0.962 3	6.65	0.962 7	9.38
	朴素贝叶斯	0.803 8	167.22	0.799 4	172.74	0.799 3	175.84
	神经网络	0.956 5	11.50	0.959 2	10.53	0.958 2	9.15
	决策树	0.826 7	42.74	0.827 2	40.16	0.827 1	44.32
recognition	随机森林	0.901 3	6.41	0.901 8	5.42	0.901 4	9.27
	朴素贝叶斯	0.633 2	21.78	0.634 0	19.37	0.634 2	20.54
	神经网络	0.839 9	21.01	0.841 0	11.21	0.841 8	12.18
	决策树	0.764 7	21.44	0.766 6	20.15	0.767 6	15.80
segment	随机森林	0.948 2	22.82	0.948 7	30.67	0.953 3	42.52
	朴素贝叶斯	0.793 0	99.68	0.794 2	110.83	0.785 1	120.75
	神经网络	0.779 7	303.81	0.781 8	294.75	0.788 6	388.84
	决策树	0.925 1	40.52	0.923 7	45.45	0.920 2	54.10
segmentation	随机森林	0.950 5	29.26	0.948 0	31.37	0.951 1	25.40
	朴素贝叶斯	0.792 2	110.21	0.793 7	127.75	0.788 9	165.34
	神经网络	0.782 0	297.72	0.770 5	452.76	0.789 0	264.45
	决策树	0.918 8	53.95	0.926 0	41.33	0.920 5	64.32
telescope	随机森林	0.866 0	8.34	0.865 4	5.67	0.865 2	6.49
	朴素贝叶斯	0.728 1	14.81	0.726 0	17.78	0.727 5	16.10
	神经网络	0.812 2	14.83	0.809 5	19.12	0.811 7	18.19
	决策树	0.800 2	9.11	0.799 7	9.95	0.802 8	12.12

数据集	初始全局模型	k=1		k=2		k=3
数据集	初始全局模型	准确率	方差/10^-5	准确率	方差/10^-5	准确率	方差/10^-5
digits	随机森林	0.962 4	10.79	0.962 7	11.69	0.962 3	9.53
	朴素贝叶斯	0.798 8	175.32	0.793 7	219.40	0.798 2	186.67
	神经网络	0.960 2	11.58	0.957 3	9.47	0.956 6	11.07
	决策树	0.822 9	36.53	0.823 4	45.51	0.823 0	52.87
recognition	随机森林	0.901 3	5.52	0.901 4	8.40	0.902 7	6.17
	朴素贝叶斯	0.633 9	18.13	0.633 9	19.38	0.632 4	21.41
	神经网络	0.838 9	24.65	0.838 9	17.84	0.838 58	19.39
	决策树	0.765 7	17.29	0.767 8	14.05	0.766 7	17.13
segment	随机森林	0.951 1	32.55	0.950 1	31.66	0.947 2	29.59
	朴素贝叶斯	0.790 2	86.48	0.791 0	135.76	0.787 3	128.91
	神经网络	0.880 0	184.09	0.882 2	182.46	0.884 7	201.99
	决策树	0.921 9	55.81	0.922 7	46.23	0.925 0	42.23
segmentation	随机森林	0.949 7	34.26	0.948 0	33.65	0.951 5	24.47
	朴素贝叶斯	0.786 0	174.30	0.793 5	120.29	0.792 7	152.34
	神经网络	0.882 7	201.71	0.878 0	210.58	0.878 0	233.33
	决策树	0.925 4	53.00	0.917 5	47.24	0.920 0	53.30
telescope	随机森林	0.864 7	8.63	0.865 5	7.81	0.865 5	9.18
	朴素贝叶斯	0.725 3	17.59	0.727 4	13.24	0.725 4	16.77
	神经网络	0.806 5	29.24	0.808 6	26.64	0.809 5	21.53
	决策树	0.800 3	11.56	0.800 3	14.15	0.799 3	13.53

Improved federated weighted average algorithm

改进的联邦加权平均算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 20

Related Articles 15

Recommended Articles

Metrics

数据集	分割方式	初始全局模型	加权联邦平均算法		传统联邦平均算法
数据集	分割方式	初始全局模型	准确率	方差/10^-5	准确率	方差/10^-5
digits	均分	随机森林	0.963 0	2.49	0.963 0	2.45
		朴素贝叶斯	0.809 9	26.26	0.798 6	31.09
		神经网络	0.958 7	2.01	0.958 6	2.03
		决策树	0.828 3	6.65	0.827 4	6.41
	非均分	随机森林	0.963 4	2.04	0.963 4	2.06
		朴素贝叶斯	0.811 4	35.29	0.799 1	49.21
		神经网络	0.959 3	1.89	0.959 1	2.02
		决策树	0.829 2	6.33	0.828 3	6.23
recognition	均分	随机森林	0.901 1	1.55	0.901 0	1.55
		朴素贝叶斯	0.634 3	4.13	0.633 7	4.02
		神经网络	0.841 2	2.60	0.840 8	2.56
		决策树	0.767 8	3.42	0.767 3	3.50
	非均分	随机森林	0.903 2	2.23	0.903 0	2.23
		朴素贝叶斯	0.632 9	4.60	0.632 4	4.43
		神经网络	0.840 3	4.99	0.839 4	5.49
		决策树	0.771 7	6.29	0.771 0	6.02
segment	均分	随机森林	0.949 1	5.82	0.948 9	5.98
		朴素贝叶斯	0.793 0	36.69	0.790 1	33.70
		神经网络	0.801 6	77.91	0.787 9	81.88
		决策树	0.922 8	8.20	0.922 5	7.85
	非均分	随机森林	0.950 6	4.69	0.950 3	4.72
		朴素贝叶斯	0.792 4	37.20	0.788 9	33.31
		神经网络	0.891 7	22.98	0.884 1	51.07
		决策树	0.923 5	8.89	0.922 8	9.32
segmentation	均分	随机森林	0.950 4	5.29	0.950 1	5.47
		朴素贝叶斯	0.796 4	36.95	0.793 3	33.84
		神经网络	0.794 4	62.31	0.778 4	73.85
		决策树	0.921 5	8.28	0.920 9	7.85
	非均分	随机森林	0.951 5	5.83	0.951 2	5.83
		朴素贝叶斯	0.794 9	28.47	0.791 8	29.51
		神经网络	0.890 6	26.15	0.882 0	64.88
		决策树	0.921 5	8.28	0.920 9	7.85
telescope	均分	随机森林	0.865 1	1.36	0.865 1	1.35
		朴素贝叶斯	0.726 0	4.22	0.725 9	4.16
		神经网络	0.811 3	3.52	0.810 7	4.47
		决策树	0.800 1	1.85	0.800 0	1.78
	非均分	随机森林	0.868 9	2.62	0.868 9	2.51
		朴素贝叶斯	0.725 3	3.99	0.725 1	3.95
		神经网络	0.809 1	4.09	0.807 7	6.53
		决策树	0.803 9	3.54	0.803 6	3.37

数据集	模型	准确率	数据集	模型	准确率
digits	随机森林	0.977 8	segmentation	随机森林	0.975 3
	朴素贝叶斯	0.790 0		朴素贝叶斯	0.769 7
	神经网络	0.975 6		神经网络	0.941 1
	决策树	0.893 9		决策树	0.963 6
recognition	随机森林	0.965 9	telescope	随机森林	0.881 3
	朴素贝叶斯	0.641 4		朴素贝叶斯	0.726 9
	神经网络	0.927 4		神经网络	0.836 1
	决策树	0.883 9		决策树	0.811 6
segment	随机森林	0.978 4
	朴素贝叶斯	0.796 5
	神经网络	0.955 8
	决策树	0.962 8

数据集	模型	准确率	数据集	模型	准确率
digits	随机森林	0.974 4	segmentation	随机森林	0.964 1
	朴素贝叶斯	0.750 9		朴素贝叶斯	0.770 1
	神经网络	0.970 3		神经网络	0.882 3
	决策树	0.879 8		决策树	0.936 8
recognition	随机森林	0.940 2	telescope	随机森林	0.866 2
	朴素贝叶斯	0.600 8		朴素贝叶斯	0.729 7
	神经网络	0.810 7		神经网络	0.646 2
	决策树	0.682 5		决策树	0.828 4
segment	随机森林	0.966 0
	朴素贝叶斯	0.773 8
	神经网络	0.859 2
	决策树	0.937 3

[1]	Xinyuan QIU, Zecong YE, Xiaolong CUI, Zhiqiang GAO. Survey of communication overhead of federated learning [J]. Journal of Computer Applications, 2022, 42(2): 333-342.
[2]	GUO Mian, ZHANG Jinyou. Computation offloading policy for machine learning in mobile edge computing environments [J]. Journal of Computer Applications, 2021, 41(9): 2639-2645.
[3]	DONG Wentao, LI Zhuo, CHEN Xin. Online short video content distribution strategy based on federated learning [J]. Journal of Computer Applications, 2021, 41(6): 1551-1556.
[4]	WANG Jiarui, TAN Guoping, ZHOU Siyuan. Clustered wireless federated learning algorithm in high-speed internet of vehicles scenes [J]. Journal of Computer Applications, 2021, 41(6): 1546-1550.
[5]	LUO Changyin, CHEN Xuebin, MA Chundi, WANG Junyu. Online federated incremental learning algorithm for blockchain [J]. Journal of Computer Applications, 2021, 41(2): 363-371.
[6]	GU Tong, XU Guoliang, LI Wanlin, LI Jiahao, WANG Zhiyuan, LUO Jiangtao. Intelligent house price evaluation model based on ensemble LightGBM and Bayesian optimization strategy [J]. Journal of Computer Applications, 2020, 40(9): 2762-2767.
[7]	ZHANG Junru, ZHAO Xiaoyan, YUAN Peiyan. Federated security tree algorithm for user privacy protection [J]. Journal of Computer Applications, 2020, 40(10): 2980-2985.
[8]	HUANG Yongxin, TANG Xuefei. Discovery of functional dependencies in university data based on affinity propagation clustering and TANE algorithms [J]. Journal of Computer Applications, 2020, 40(1): 90-95.
[9]	WANG Cheng, CUI Ziwei, DU Zilin, GAO Yueer. Repairing of missing bus arrival data based on DBSCAN algorithm and multi-source data [J]. Journal of Computer Applications, 2019, 39(11): 3184-3190.
[10]	WU Xuchen, PIAO Chunhui, JIANG Xuehong. Siting model of electric taxi charging station based on GPU parallel computing [J]. Journal of Computer Applications, 2019, 39(10): 3071-3078.
[11]	GUO Fangfang, CHAO Luomeng, ZHU Jianwen. Multi-source data parallel preprocessing method based on similar connection [J]. Journal of Computer Applications, 2019, 39(1): 57-60.
[12]	WANG Taochun, LIU Tingting, LIU Shen, HE Guodong. Participant reputation evaluation scheme in crowd sensing [J]. Journal of Computer Applications, 2018, 38(3): 753-757.
[13]	XU Xiaowei, DU Yi, ZHOU Yuanchun. Resident behavior model analysis method based on multi-source travel data [J]. Journal of Computer Applications, 2017, 37(8): 2362-2367.
[14]	ZHU Huijuan, JIANG Tonghai, ZHOU Xi, CHENG Li, ZHAO Fan, MA Bo. Data cleaning method based on dynamic configurable rules [J]. Journal of Computer Applications, 2017, 37(4): 1014-1020.
[15]	ZHENG Qibin, DIAO Xingchun, CAO Jianjun, ZHOU Xing, XU Yongping. k-nearest neighbor data imputation algorithm combined with locality sensitive Hashing [J]. Journal of Computer Applications, 2016, 36(2): 397-401.