Personalized privacy protection method for data with multiple numerical sensitive attributes

doi:10.11772/j.issn.1001-9081.2019091639

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 491-496.DOI: 10.11772/j.issn.1001-9081.2019091639

• CCF Bigdata 2019 • Previous Articles Next Articles

Personalized privacy protection method for data with multiple numerical sensitive attributes

Meishu ZHANG¹^,², Yabin XU¹^,²^,³()

^1.Beijing Key Laboratory of Internet Culture and Digital Dissemination Research （Beijing Information Science and Technology University），Beijing 100101，China
^2.School of Computer，Beijing Information Science and Technology University，Beijing 100101，China
^3.Beijing Advanced Innovation Center for Materials Genome Engineering （Beijing Information Science and Technology University），Beijing 100101，China

Received:2019-08-30 Revised:2019-10-10 Accepted:2019-10-11 Online:2019-10-31 Published:2020-02-10
Contact: Yabin XU
About author:ZHANG Meishu， born in 1994， M. S. candidate. Her research interests include big data privacy protection， quantum encryption communication.
Supported by:
the National Natural Science Foundation of China(61672101);the Foundation of Beijing Key Laboratory of Internet Culture and Digital Dissemination Research(ICDDXN004);the Key Lab of Information Network Security, Ministry of Public Security(C18601)

多维数值型敏感属性数据的个性化隐私保护方法

张梅舒¹^,², 徐雅斌¹^,²^,³()

^1.网络文化与数字传播北京市重点实验室（北京信息科技大学），北京 100101
^2.北京信息科技大学计算机学院，北京 100101
^3.北京材料基因工程高精尖创新中心（北京信息科技大学），北京 100101

通讯作者: 徐雅斌
作者简介:张梅舒（1994—），女，河南周口人，硕士研究生，主要研究方向：大数据隐私保护、量子加密通信；
基金资助:
国家自然科学基金资助项目(61672101);网络文化与数字传播北京市重点实验室资助项目(ICDDXN004);信息网络安全公安部重点实验室开放课题资助项目(C18601)

Abstract

Abstract:

The existing privacy protection methods for data with multiple numerical sensitive attributes not only have the problem of large loss of information about quasi-identifier attributes， but also have the problem that they cannot satisfy the user’s personalized need for ranking the importance of numerically sensitive attributes. To solve the above problems， a personalized privacy protection method based on clustering and weighted Multi-Sensitive Bucketization （MSB） was proposed. Firstly， according to the similarity of quasi-identifiers， the dataset was divided into several subsets with similar values of quasi-identifier attributes. Then， considering the different sensitivities of users to sensitive attributes， the sensitivity and the bucket capacity of multi-dimensional buckets were used to calculate the weighted selectivity and to construct the weighted multi-dimensional buckets. Finally， the data were grouped and anonymized according to all above. Eight attributes in UCI’s standard Adult dataset were selected for experiments， and the proposed method was compared with MNSACM and WMNSAPM. Experimental results show that the proposed method is better generally and is significantly superior to the comparison methods in reducing information loss and running time， which improves the data quality and operating efficiency.

Key words: privacy protection, multiple numerical sensitive attribute, clustering, anonymity, personalization

摘要：

为了解决多维数值型敏感属性数据隐私保护方法中存在的准标识符属性信息损失大，以及不能满足用户对数值型敏感属性重要性排序的个性化需求问题，提出一种基于聚类和加权多维桶分组（MSB）的个性化隐私保护方法。首先，根据准标识符的相似程度，将数据集划分成若干准标识符属性值相近的子集；然后，考虑到用户对敏感属性的敏感程度不同，将敏感程度和多维桶的桶容量用于计算加权选择度和构建加权多维桶；最后，依此对数据进行分组和匿名化处理。选用UCI的标准Adult数据集中的8个属性进行实验，并与基于聚类和多维桶的数据隐私保护方法MNSACM和基于聚类和加权多维桶分组的个性化隐私保护方法WMNSAPM进行对比。实验结果表明，所提方法整体较优，并且在减少信息损失和运行时间方面明显优于对比方法，提高了数据质量和运行效率。

关键词: 隐私保护, 多维数值型敏感属性, 聚类, 匿名化, 个性化

CLC Number:

TP391

Meishu ZHANG, Yabin XU. Personalized privacy protection method for data with multiple numerical sensitive attributes[J]. Journal of Computer Applications, 2020, 40(2): 491-496.

张梅舒, 徐雅斌. 多维数值型敏感属性数据的个性化隐私保护方法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 491-496.

Figures/Tables 10

References 21

1	JAYABALAN M， RANA M E. Anonymizing healthcare records： a study of privacy preserving data publishing techniques［J］. Advanced Science Letters， 2018， 24（3）： 1694-1697. 10.1166/asl.2018.11139
2	VIJI D， SARAVANAN K， HEMAVATHI D. A journey on privacy protection strategies in big data［C］// Proceedings of the 2017 International Conference on Intelligent Computing and Control Systems. Piscataway： IEEE， 2017. 10.1109/iccons.2017.8250688
3	王乐，杨哲荣，刘容京，等. 基于属性加密算法的可穿戴设备系统隐私保护方法研究［J］. 信息网络安全， 2018， 18（6）：77-84. 10.3969/j.issn.1671-1122.2018.06.010
	WANG L， YANG Z R， LIU R J， et al. A CP-ABE privacy preserving method for wearable devices［J］. Netinfo Security， 2018， 18（6）： 77-84. 10.3969/j.issn.1671-1122.2018.06.010
4	SWEENEY L. k-anonymity： a model for protecting privacy［J］. International Journal of Uncertainty， Fuzziness and Knowledge-Based Systems， 2002， 10（5）： 557-570. 10.1142/s0218488502001648
5	MACHANAVAJJHALA A， GEHRKE J， KIFER D， et al. l-diversity： privacy beyond k-anonymity［C］// Proceedings of the 22nd International Conference on Data Engineering. Piscataway： IEEE， 2006：24-24. 10.1109/icde.2006.1
6	LI N， LI T， VENKATASUBRAMANIAN S. t-closeness： privacy beyond k-anonymity and l-diversity［C］// Proceedings of the IEEE 23rd International Conference on Data Engineering. Piscataway： IEEE， 2007：106-115. 10.1109/icde.2007.367856
7	杨晓春，王雅哲，王斌，等. 数据发布中面向多敏感属性的隐私保护方法［J］. 计算机学报，2008，31（4）：574-587. 10.3321/j.issn:0254-4164.2008.04.004
	YANG X C， WANG Y Z， WANG B， et al. Privacy preserving approaches for multiple sensitive attributes in data publishing［J］. Chinese Journal of Computers， 2008， 31（4）： 574-587. 10.3321/j.issn:0254-4164.2008.04.004
8	金华，刘善成，鞠时光. 面向多敏感属性医疗数据发布的隐私保护技术［J］. 计算机科学， 2011， 38（12）：172-177. 10.3969/j.issn.1002-137X.2011.12.039
	JIN H， LIU S C， JU S G. Privacy preserving technology for multiple sensitive attributes in medical data publishing［J］. Computer Science， 2011， 38（12）： 172-177. 10.3969/j.issn.1002-137X.2011.12.039
9	杨静，王波. 一种基于最小选择度优先的多敏感属性个性化l-多样性算法［J］. 计算机研究与发展， 2012， 49（12）：2603-2610.
	YANG J， WANG B. Personalized l-diversity algorithm for multiple sensitive attributes based on minimum selected degree first［J］. Journal of Computer Research and Development， 2012， 49（12）： 2603-2610.
10	罗方炜，韩建民，鲁剑峰，等. 抵制多敏感属性关联攻击的（l， m）-多样性模型［J］. 小型微型计算机系统， 2013， 34（6）：1387-1391. 10.3969/j.issn.1000-1220.2013.06.036
	LUO F W， HAN J M， LU J F， et al. A （l， m）-diversity model of resisting the associated attack on multi-sensitive attributes［J］. Journal of Chinese Computer Systems， 2013， 34（6）： 1387-1391. 10.3969/j.issn.1000-1220.2013.06.036
11	JIA J， CHEN L. (l,m,d)-anonymity： a resisting similarity attack model for multiple sensitive attributes［C］// Proceedings of the IEEE 2nd Information Technology， Networking， Electronic and Automation Control Conference. Piscataway： IEEE， 2017： 756-760. 10.1109/itnec.2017.8284835
12	刘善成，金华，鞠时光. 数据发布中面向多敏感属性的隐私保护技术［J］. 计算机应用研究， 2011， 28（6）：2206-2211， 2214. 10.3969/j.issn.1001-3695.2011.06.057
	LIU S C， JIN H， JU S G. Privacy preserving technology for multiple sensitive attributes in data publishing ［J］. Application Research of Computers， 2011， 28（6）： 2206-2211， 2214. 10.3969/j.issn.1001-3695.2011.06.057
13	ZHANG L， XUAN J， SI R， et al. An improved algorithm of individuation k-anonymity for multiple sensitive attributes［J］. Wireless Personal Communications， 2017， 95（3）： 2003-2020. 10.1007/s11277-016-3922-4
14	李文.面向隐私保护的多敏感属性数据发布分组方法研究［D］.武汉：华中科技大学，2017：24-29. 10.3390/info11030166
	LI W. Research on multi-sensitive attributes data publishing grouping method for privacy preserving［D］. Wuhan： Huazhong University of Science and Technology， 2017： 24-29. 10.3390/info11030166
15	WU Y， RUAN X， LIAO S， et al. P-cover k-anonymity model for protecting multiple sensitive attributes［C］// Proceedings of the 5th International Conference on Computer Science and Education. Piscataway： IEEE， 2010：179-183. 10.1109/ICCSE.2010.5593663
16	宋明秋，王琳，姜宝彦，等. 多属性泛化的K-匿名算法［J］. 电子科技大学学报， 2017， 46（6）：896-901. 10.3969/j.issn.1001-0548.2017.06.018
	SONG M Q， WANG L， JANG B Y， et al. K-anonymity algorithm based on multi attribute generalization［J］. Journal of University of Electronic Science and Technology of China， 2017， 46（6）： 896-901. 10.3969/j.issn.1001-0548.2017.06.018
17	王秋月，葛丽娜，耿博，等. 基于多敏感属性分级的(αij,k,m)-匿名隐私保护方法［J］. 计算机应用， 2018， 38（1）：67-72， 103. 10.11772/j.issn.1001-9081.2017071863
	WANG Q Y，GE L N， GENG B， et al. Hierarchical (αij,k,m)-anonymity privacy preservation based on multiple sensitive attributes［J］. Journal of Computer Applications， 2018， 38（1）： 67-72， 103. 10.11772/j.issn.1001-9081.2017071863
18	刘腾腾，倪巍伟，崇志宏，等. 多维数值敏感属性隐私保护数据发布方法［J］. 东南大学学报（自然科学版）， 2010， 40（4）：699-703. 10.3969/j.issn.1001-0505.2010.04.007
	LIU T T， NI W W， CHONG Z H， et al. Privacy-preserving data publishing methods for multiple numerical sensitive attributes［J］. Journal of Southeast University （Natural Science Edition）， 2010， 40（4）： 699-703. 10.3969/j.issn.1001-0505.2010.04.007
19	LIU Q， SHEN H， SANG Y. A privacy-preserving data publishing method for multiple numerical sensitive attributes via clustering and multi-sensitive bucketization［C］// Proceedings of the 6th International Symposium on Parallel Architectures， Algorithms and Programming. Piscataway： IEEE， 2014： 220-223. 10.1109/paap.2014.56
20	LIU Q， SHEN H， SANG Y. Privacy-preserving data publishing for multiple numerical sensitive attributes［J］. Tsinghua Science and Technology， 2015， 20（3）： 246-254. 10.1109/TST.2015.7128936
21	陆洋. 面向聚类的多敏感属性数据发布隐私保护研究［D］. 南京：南京邮电大学， 2016： 21-28.
	LU Y. Research on privacy preserving data publishing for multi-sensitive attribute based on clustering［D］. Nanjing： Nanjing University of Posts and Telecommunications， 2016： 21-28.

id	age	workclass	profit	hours-per-week
t₁	29	local-gov	1 151	25
t₂	31	self-emp-inc	4 650	40
t₃	30	local-gov	3 137	60
t₄	24	self-emp-not-inc	5 013	50
t₅	32	private	7 688	50
t₆	28	state-gov	-1 672	40
t₇	35	self-emp-inc	4 386	48
t₈	26	local-gov	7 298	43
t₉	35	private	15 024	70

id	age	workclass	profit	hours-per-week
t₁	29	local-gov	1 151	25
t₂	31	self-emp-inc	4 650	40
t₃	30	local-gov	3 137	60
t₄	24	self-emp-not-inc	5 013	50
t₅	32	private	7 688	50
t₆	28	state-gov	-1 672	40
t₇	35	self-emp-inc	4 386	48
t₈	26	local-gov	7 298	43
t₉	35	private	15 024	70

敏感属性值	S₁₁	S₁₂	S₁₃	S₁₄	S₁₅
S₂₁		t₁
S₂₂	t₆		｛t₂，t₇｝	t₈
S₂₃			t₄	t₅
S₂₄		t₃
S₂₅					t₉

敏感属性值	S₁₁	S₁₂	S₁₃	S₁₄	S₁₅
S₂₁		t₁
S₂₂	t₆		｛t₂，t₇｝	t₈
S₂₃			t₄	t₅
S₂₄		t₃
S₂₅					t₉

敏感属性值	S₁₁	S₁₂	S₁₃	S₁₄	S₁₅
S₂₁		t₁（1.26）
S₂₂	t₆（1.33）		｛t₂，t₇｝（2.48）	t₈（1.59）
S₂₃			t₄（1.63）	t₅（1.37）
S₂₄		t₃（1.26）
S₂₅					t₉（1）

Personalized privacy protection method for data with multiple numerical sensitive attributes

多维数值型敏感属性数据的个性化隐私保护方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 10

References 21

Related Articles 15

Recommended Articles

Metrics

id	age	workclass	profit	hours-per-week
t₁	29~32	work	1 151	25
t₂			4 650	40
t₅			7 688	50
t₃	30~35	work	3 137	60
t₇			4 386	48
t₉			15 024	70
t₄	24~28	work	5 013	50
t₆			-1 672	40
t₈			7 298	43

序号	属性	属性类型	属性值数
1	age	数值型	72
2	workclass	分类型	7
3	education	分类型	16
4	education-num	数值型	16
5	marital-status	分类型	7
6	sex	分类型	2
7	profit	数值型	207
8	hours-per-week	数值型	75

[1]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[2]	Qing WANG, Jieyu ZHAO, Xulun YE, Nongxiao WANG. Enhanced deep subspace clustering method with unified framework [J]. Journal of Computer Applications, 2024, 44(7): 1995-2003.
[3]	Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682.
[4]	Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN. Deep event clustering method based on event representation and contrastive learning [J]. Journal of Computer Applications, 2024, 44(6): 1734-1742.
[5]	Xuebin CHEN, Zhiqiang REN, Hongyang ZHANG. Review on security threats and defense measures in federated learning [J]. Journal of Computer Applications, 2024, 44(6): 1663-1672.
[6]	Peiqian LIU, Shuilian WANG, Zihao SHEN, Hui WANG. Location privacy protection algorithm based on trajectory perturbation and road network matching [J]. Journal of Computer Applications, 2024, 44(5): 1546-1554.
[7]	Tianyu HUANG, Yuanxing LI, Hao CHEN, Zijia GUO, Mingjun WEI. User cluster partitioning method based on weighted fuzzy clustering in ground-air collaboration scenarios [J]. Journal of Computer Applications, 2024, 44(5): 1555-1561.
[8]	Lin GAO, Yu ZHOU, Tak Wu KWONG. Evolutionary bi-level adaptive local feature selection [J]. Journal of Computer Applications, 2024, 44(5): 1408-1414.
[9]	Tongtong XU, Bin XIE, Chunhao ZHANG, Ximei ZHANG. Multi-order nearest neighbor graph clustering algorithm by fusing transition probability matrix [J]. Journal of Computer Applications, 2024, 44(5): 1527-1538.
[10]	Yu DING, Hanlin ZHANG, Rong LUO, Hua MENG. Fuzzy clustering algorithm based on belief subcluster cutting [J]. Journal of Computer Applications, 2024, 44(4): 1128-1138.
[11]	Gaimei GAO, Jin ZHANG, Chunxia LIU, Weichao DANG, Shangwang BAI. Privacy protection scheme for crowdsourced testing tasks based on blockchain and CP-ABE policy hiding [J]. Journal of Computer Applications, 2024, 44(3): 811-818.
[12]	Long CHEN, Xuanlin YU, Wen CHEN, Yi YAO, Wenjing ZHU, Ying JIA, Denghong LI, Zhi REN. Efficient clustered routing protocol for intelligent road cone ad-hoc networks based on non-random clustering [J]. Journal of Computer Applications, 2024, 44(3): 869-875.
[13]	Lin SUN, Menghan LIU. K-means clustering based on adaptive cuckoo optimization feature selection [J]. Journal of Computer Applications, 2024, 44(3): 831-841.
[14]	Haifeng MA, Yuxia LI, Qingshui XUE, Jiahai YANG, Yongfu GAO. Attribute-based encryption scheme for blockchain privacy protection [J]. Journal of Computer Applications, 2024, 44(2): 485-489.
[15]	Zhuo ZHANG, Huazhu CHEN. Deep subspace clustering based on multiscale self-representation learning with consistency and diversity [J]. Journal of Computer Applications, 2024, 44(2): 353-359.