变精度下不完备混合数据的增量式属性约简方法

doi:10.11772/j.issn.1001-9081.2018041293

计算机应用 ›› 2018, Vol. 38 ›› Issue (10): 2764-2771.DOI: 10.11772/j.issn.1001-9081.2018041293

• 2018中国粒计算与知识发现学术会议(CGCKD 2018)论文 • 上一篇下一篇

变精度下不完备混合数据的增量式属性约简方法

王映龙¹, 曾淇¹, 钱文彬^1,2, 舒文豪³, 黄锦涛¹

1. 江西农业大学计算机与信息工程学院, 南昌 330045;
2. 江西农业大学软件学院, 南昌 330045;
3. 华东交通大学信息工程学院, 南昌 330013

收稿日期:2018-03-30 修回日期:2018-06-21 出版日期:2018-10-10 发布日期:2018-10-13
通讯作者: 钱文彬
作者简介:王映龙(1970-),男,江西九江人,教授,博士,主要研究方向:计算智能、知识发现;曾淇(1991-),女,江西赣州人,硕士研究生,主要研究方向:粗糙集、数据挖掘;钱文彬(1984-),男,江西宜春人,副教授,博士,主要研究方向:粒计算、三支决策、知识发现;舒文豪(1985-),女,江西吉安人,讲师,博士,主要研究方向:粒计算、知识发现;黄锦涛(1995-),男,湖南岳阳人,硕士研究生,主要研究方向:粒计算、机器学习。
基金资助:
国家自然科学基金资助项目（61502213，61662023）；江西省自然科学基金资助项目（20161BAB212049）。

Incremental attribute reduction method for incomplete hybrid data with variable precision

WANG Yinglong¹, ZENG Qi¹, QIAN Wenbin^1,2, SHU Wenhao³, HUANG Jintao¹

1. School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang Jiangxi 330045, China;
2. School of Software, Jiangxi Agricultural University, Nanchang Jiangxi 330045, China;
3. School of Information Engineering, East China Jiaotong University, Nanchang Jiangxi 330013, China

Received:2018-03-30 Revised:2018-06-21 Online:2018-10-10 Published:2018-10-13
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61502213, 61662023), the Natural Science Foundation of Jiangxi Province (20161BAB212049).

摘要/Abstract

摘要： 为了解决当不完备混合决策系统中数据动态增加时，静态属性约简方法的计算复杂度高的问题，提出变精度下不完备混合数据的增量式属性约简方法。首先，在变精度模型下给出了利用条件熵度量属性的重要性程度；然后，详细分析和设计了当数据动态增加时条件熵的增量式更新变化情况和属性约简的更新机制；在此基础上，利用启发式贪心策略构造了增量式的属性约简算法，实现了不完备的数值型和符号型混合数据下属性约简的动态更新。通过UCI数据集中五个真实的混合型数据集的实验比较和分析，在约简效果方面，利用增量式属性约简算法处理Echocardiogram、Hepatitis、Autos、Credit和Dermatology数据集的增量规模为90%+10%时，数据集的原属性个数分别由12、19、25、17和34个约简至6、7、10、11和13个，分别占原属性集的50.0%、36.8%、40.0%、64.7%和38.2%；在执行时间方面，增量式算法在五个数据集的平均耗时分别为2.99 s、3.13 s、9.70 s、274.19 s和50.87 s，静态算法的平均耗时分别为284.92 s、302.76 s、1062.23 s、3510.79 s和667.85 s，且增量式算法的耗时与数据集的实例规模、属性个数和属性值类型的分布相关。实验结果表明，增量式属性约简算法在计算耗时方面要显著优于静态算法，且能有效剔除数据中的冗余属性。

关键词: 粗糙集, 属性约简, 邻域关系, 增量式方法, 不完备混合数据

Abstract: In order to deal with the highly computational complexity of static attribute reduction when the data increasing dynamically in incomplete hybrid decision system, an incremental attribute reduction method was proposed for incomplete hybrid data with variable precision. The important degrees of attributes were measured by conditional entropy in the variable precision model. Then the incremental updating of conditional entropy and the updating mechanism of attribute reduction were analyzed and designed in detail when the data is dynamically increased. An incremental attribute reduction method was constructed by heuristic greedy strategy which can achieve the dynamical updating of attribute reduction of incomplete numeric and symbolic hybrid data. Through the experimental comparison and analysis of five real hybrid datasets in UCI, in terms of the reduction effects, when the incremental size of the Echocardiogram, Hepatitis, Autos, Credit and Dermatology increased to 90%+10%, the original number of attributes is reduced from 12, 19, 25, 17, 34 to 6, 7, 10, 11, 13, which is accounted for 50.0%, 36.8%, 40.0%, 64.7%, 38.2% of the original attribute set; in terms of the execution time, the average time consumed by the incremental algorithm in the five datasets is 2.99, 3.13, 9.70, 274.19, 50.87 seconds, and the average time consumed by the static algorithm is 284.92, 302.76, 1062.23, 3510.79, 667.85 seconds. The time-consuming of the incremental algorithm is related to the distribution of the instance size, the number of attributes, and the attribute value type of the data set. The experimental results show that the incremental attribute reduction algorithm is significantly superior to the static algorithm in time-consuming, and can effectively eliminate redundant attributes.

Key words: rough set, attribute reduction, neighborhood relation, incremental method, incomplete hybrid data

中图分类号:

TP18

王映龙, 曾淇, 钱文彬, 舒文豪, 黄锦涛. 变精度下不完备混合数据的增量式属性约简方法[J]. 计算机应用, 2018, 38(10): 2764-2771.

WANG Yinglong, ZENG Qi, QIAN Wenbin, SHU Wenhao, HUANG Jintao. Incremental attribute reduction method for incomplete hybrid data with variable precision[J]. Journal of Computer Applications, 2018, 38(10): 2764-2771.

参考文献

[1] 姚旭, 王晓丹, 张玉玺, 等. 特征选择方法综述[J]. 控制与决策, 2012, 27(2):161-166. (YAO X, WANG X D, ZHANG Y X, et al. Summary of feature selection algorithms[J]. Control and Decision, 2012, 27(2):161-166.)
[2] 梁吉业, 钱宇华, 李德玉, 等. 大数据挖掘的粒计算理论与方法[J]. 中国科学:信息科学, 2015, 45(11):1355-1369. (LIANG J Y, QIAN Y H, LI D Y, et al. Theory and method of granular computing for big data mining[J]. Science China:Information Science, 2015, 45(11):1355-1369.)
[3] 杨明. 一种基于改进差别矩阵的属性约简增量式更新算法[J]. 计算机学报, 2007, 30(5):5815-5822. (YANG M. An incremental updating algorithm for attribute reduction based on improved discernibility matrix[J]. Chinese Journal of Computers, 2007, 30(5):5815-5822.)
[4] CHEN H M, LI T R, QIAN S J. A rough set based dynamic maintenance approach for approximations in coarsening and refining attribute values[J]. International Journal of Intelligent Systems, 2010, 25(10):1005-1026.
[5] LIANG J Y, WANG F, DANG C Y. A group incremental approach to feature selection applying rough set technique[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(2):294-308.
[6] 王永生, 郑雪峰, 锁延锋. 一种基于信息粒度的动态属性约简求解算法[J]. 计算机科学, 2015, 42(4):213-216. (WANG Y S, ZHENG X F, SUO Y F. Dynamic attribute for computing algorithm reduction based on information granularity[J]. Computer Science, 2015, 42(4):213-216.)
[7] 张钧波, 李天瑞, 潘毅. 云平台下基于粗糙集的并行增量知识更新算法[J]. 软件学报, 2015, 26(5):1064-1078. (ZHANG J B, LI T R, PAN Y. Parallel and incremental algorithm for knowledge update based on rough sets in cloud platform[J]. Journal of Software, 2015, 26(5):1064-1078.)
[8] 钱文彬, 杨炳儒, 徐章艳, 等.基于信息熵的核属性增量式高效更新算法[J]. 模式识别与人工智能, 2013, 26(1):42-49. (QIAN W B, YANG B R, XU Z Y, et al. Efficient incremental updating algorithm for core attribute based on information entropy[J]. Pattern Recognition and Artificial Intelligence, 2013, 26(1):42-49.)
[9] 王磊, 李天瑞, 刘清, 等. 对象集变化时近似集动态维护的矩阵方法[J]. 计算机研究与发展, 2013, 50(9):1992-2004. (WANG L, LI T R, LIU Q, et al. A matrix based method approach for maintenance of approximate under the variation of object set[J]. Journal of Computer Research and Development, 2013, 50(9):1992-2004.)
[10] 沈家兰, 汪小燕, 申元霞. 可变程度多粒度粗糙集[J]. 小型微型计算机系统, 2016, 37(5):1012-1016. (SHEN J L, WANG X Y, SHEN Y X. Variable degree multi-grained rough set[J]. Journal of Chinese Computer Systems, 2016, 37(5):1012-1016.)
[11] LIU D, LI T R, ZHANG J B. A rough set-based incremental approach for learning knowledge in dynamic incomplete information systems[J]. International Journal of Approximate Reasoning, 2014, 55(8):1764-1786.
[12] 钱宇华, 梁吉业, 王锋. 面向非完备决策表的正向近似特征选择加速算法[J]. 计算机学报, 2011, 34(3):3435-3442. (QIAN Y H, LIANG J Y, WANG F. A positive approximation based accelerated algorithm to feature selection from incomplete decision tables[J]. Chinese Journal of Computers, 2011, 34(3):3435-3442.)
[13] 李楠, 谢娟英. 基于邻域粗糙集的增量特征选择[J]. 计算机技术与发展, 2011, 21(11):149-152. (LI N, XIE J Y. A feature subset algorithm based on neighborhood rough set for incremental updating datasets[J]. Computer Technology and Development, 2011, 21(11):149-152.)
[14] 张扩, 续欣莹, 阎高伟, 等. 信息观下批增量式属性约简算法[J]. 山西大学学报(自然科学版), 2016, 39(3):357-370. (ZHANG K, XU X Y, YAN G W, et al. Batch of incremental attribute reduction algorithm under information view[J]. Journal of Shanxi University (Natural Science Edition), 2016, 39(3):357-370.)
[15] 徐久成, 张灵均, 孙林. 广义邻域关系下不完备混合决策系统的约简[J]. 计算机科学, 2013, 40(4):244-248. (XU J C, ZHANG L J, SUN L. Reduction in incomplete hybrid decision systems based on generalized neighbourhood relationship[J]. Computer Science, 2013, 40(4):244-248.)
[16] 梁吉业, 李德玉. 信息系统中的不确定性与知识获取[M]. 北京:科学出版社, 2005. (LIANG J Y, LI D Y. Uncertainty and Knowledge Acquisition in Information Systems[M]. Beijing:Science Press, 2005.)
[17] 米据生, 吴伟志, 张文修.基于变精度粗糙集理论的知识约简方法[J]. 系统工程理论与实践, 2004, 24(1):76-82. (MI J S, WU W Z, ZHANG W X. Knowledge reducts based on variable precision rough set theory[J]. Systems Engineering-Theory and Practice, 2004, 24(1):76-82.)
[18] 张宁, 范年柏. 基于邻域近似条件熵的启发式属性约简[J]. 计算机应用研究, 2018, 35(5):1-2. (ZHANG N, FAN N B. Heuristic attribute reduction based on neighborhood approximate conditional entropy[J]. Application Research of Computers, 2018, 35(5):1-2.)

变精度下不完备混合数据的增量式属性约简方法

Incremental attribute reduction method for incomplete hybrid data with variable precision

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王小荣, 张玉召, 张振江. 基于双论域粗糙集的快捷货物运输方案选择[J]. 计算机应用, 2021, 41(5): 1500-1505.
[2]	彭莉, 张海清, 李代伟, 唐聃, 于曦, 何磊. 基于粗糙集理论的不完备数据分析方法的混合信息系统填补算法[J]. 计算机应用, 2021, 41(3): 677-685.
[3]	王磊. 改进粗糙集属性约简结合K-means聚类的网络入侵检测方法[J]. 计算机应用, 2020, 40(7): 1996-2002.
[4]	张伍, 陈红梅. 基于多核模糊粗糙集与蝗虫优化算法的高光谱波段选择[J]. 计算机应用, 2020, 40(5): 1425-1430.
[5]	章夏杰, 朱敬华, 陈杨. Spark下的分布式粗糙集属性约简算法[J]. 计算机应用, 2020, 40(2): 518-523.
[6]	欧彬利, 钟夏汝, 代建华, 杨田. 基于变精度覆盖粗糙集的入侵检测方法[J]. 计算机应用, 2020, 40(12): 3465-3470.
[7]	李孜颖, 石振国. 面向大数据任务的调度方法[J]. 计算机应用, 2020, 40(10): 2923-2928.
[8]	张伍, 陈红梅. 基于核模糊粗糙集的高光谱波段选择算法[J]. 计算机应用, 2020, 40(1): 258-263.
[9]	鲍迪, 张楠, 童向荣, 岳晓冬. 区间值决策表的正域增量式属性约简算法[J]. 计算机应用, 2019, 39(8): 2288-2296.
[10]	徐怡, 肖鹏. 基于容差关系的多粒度粗糙集中近似集动态更新方法[J]. 计算机应用, 2019, 39(5): 1247-1251.
[11]	孔贺庆, 张楠, 岳晓冬, 童向荣, 于天佑. 基于多特定决策类的不完备决策系统正域约简[J]. 计算机应用, 2019, 39(5): 1252-1260.
[12]	陈曼如, 张楠, 童向荣, 东野升龙, 杨文静. 基于多尺度属性粒策略的快速正域约简算法[J]. 计算机应用, 2019, 39(12): 3426-3433.
[13]	郑文彬, 李进金, 于佩秋, 林艺东. 变精度多粒度粗糙集近似集更新的矩阵算法[J]. 计算机应用, 2019, 39(11): 3140-3145.
[14]	谭永奇, 樊建聪, 任延德, 周晓明. 改进的属性约简算法及其在肝癌微血管侵犯预测中的应用[J]. 计算机应用, 2019, 39(11): 3221-3226.
[15]	李旭, 荣梓景, 阮晓曦. 关系决策系统中相对不可区分和区分关系的约简[J]. 计算机应用, 2019, 39(10): 2852-2858.