CCML2021+319： 混合式的K-匿名特征选择算法

• •

CCML2021+319：混合式的K-匿名特征选择算法

杨柳,李云

南京邮电大学

收稿日期:2021-06-09 修回日期:2021-06-22 发布日期:2021-06-22
通讯作者: 杨柳

CCML2021+319： Hybrid K-Anonymous Feature Selection Algorithm

Received:2021-06-09 Revised:2021-06-22 Online:2021-06-22

摘要/Abstract

摘要： 摘要: K-匿名算法通过对数据的泛化、隐藏等手段使得数据达到K-匿名条件，在隐藏特征时同时考虑数据的隐私性与分类性能可以视为一种特殊的特征选择方法，即K-匿名特征选择。K-匿名特征选择方法结合K-匿名与特征选择的特点使用多个评价准则选出K-匿名特征子集。过滤式K-匿名特征选择方法难以搜索到所有满足K-匿名条件的候选特征子集，不能保证特征子集的分类性能是最优，而封装式特征选择方法计算成本又很大。因此，结合过滤式特征排序与封装式特征选择的特点，改进已有方法中的前向搜索策略，设计了一种混合式K-匿名特征选择方法，使用分类性能作为评价准则选出分类性能最好的K-匿名特征子集。在多个公开数据集上进行实验，结果表明，所提算法在分类性能上可以超过现有算法并且信息损失更小。

关键词: 混合式, 特征选择, 隐私保护, K-匿名, 搜索策略

Abstract: Abstract: In the era of big data, the protection of data privacy has become an issue that cannot be ignored. The K-anonymity algorithm is a classic method in the field of privacy protection. The data can reach the K-anonymity condition by generalizing and hiding the data. When hiding features, considering the privacy and classification performance of the data can be regarded as a special Feature selection method, namely K-anonymous feature selection. The K-anonymity feature selection method combines the characteristics of K-anonymity and feature selection to select a subset of K-anonymity features using multiple evaluation criteria. The filtering K-anonymous feature selection method is difficult to search for all candidate feature subsets that meet the K-anonymity condition, and cannot guarantee that the classification performance of the feature subset is optimal, and the encapsulated feature selection method has a large computational cost. Therefore, the article combines the characteristics of filtered feature ranking and encapsulated feature selection, improves the forward search strategy in the existing methods, and designs a hybrid (Hybrid) K-anonymous feature selection method, using classification performance as the evaluation criterion to select the subset of K-anonymous features with the best classification performance. Experiments on multiple public data sets show that the algorithm proposed in this paper can surpass the existing methods in classification performance, and the information loss is smaller.

Key words: hybrid, feature selection, privacy protection, K-anonymity, search strategy

中图分类号:

TP181

杨柳李云. CCML2021+319：混合式的K-匿名特征选择算法[J]. 计算机应用.

[1]	高改梅, 张瑾, 刘春霞, 党伟超, 白尚旺. 基于区块链与CP-ABE策略隐藏的众包测试任务隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 811-818.
[2]	徐大鹏, 侯新民. 基于网络结构设计的图神经网络特征选择方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 663-670.
[3]	孟圣洁, 于万钧, 陈颖. 最大相关和最大差异的高维数据特征选择算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 767-771.
[4]	孙林, 刘梦含. 基于自适应布谷鸟优化特征选择的K-means聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 831-841.
[5]	马海峰, 李玉霞, 薛庆水, 杨家海, 高永福. 用于实现区块链隐私保护的属性基加密方案[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 485-489.
[6]	周辉, 陈玉玲, 王学伟, 张洋文, 何建江. 基于生成对抗网络的联邦学习深度影子防御方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 223-232.
[7]	崔剑阳, 蔡英, 张宇, 范艳芳. 车载自组织网络中格基签密的可认证隐私保护方案[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 233-241.
[8]	何添, 沈宗鑫, 黄倩倩, 黄雁勇. 基于自适应学习的多视图无监督特征选择方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2657-2664.
[9]	李斌, 唐志斌. 面向异构多背包问题的多级二进制帝国竞争算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2855-2867.
[10]	黄硕, 李艳辉, 曹建秋. 本地化差分隐私下的频繁序列模式挖掘算法PrivSPM[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2057-2064.
[11]	蓝梦婕, 蔡剑平, 孙岚. 非独立同分布数据下的自正则化联邦学习优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2073-2081.
[12]	陈宛桢, 张恩, 秦磊勇, 洪双喜. 边缘计算下基于区块链的隐私保护联邦学习算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2209-2216.
[13]	孙林, 黄金旭, 徐久成. 基于邻域容差互信息和鲸鱼优化算法的非平衡数据特征选择[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1842-1854.
[14]	翟冉, 陈学斌, 张国鹏, 裴浪涛, 马征. 基于不同敏感度的改进K-匿名隐私保护算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1497-1503.
[15]	尹春勇, 屈锐. 基于个性化差分隐私的联邦学习算法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1160-1168.