计算机应用 ›› 2018, Vol. 38 ›› Issue (10): 2764-2771.DOI: 10.11772/j.issn.1001-9081.2018041293

• 2018中国粒计算与知识发现学术会议(CGCKD 2018)论文 • 上一篇    下一篇

变精度下不完备混合数据的增量式属性约简方法

王映龙1, 曾淇1, 钱文彬1,2, 舒文豪3, 黄锦涛1   

  1. 1. 江西农业大学 计算机与信息工程学院, 南昌 330045;
    2. 江西农业大学 软件学院, 南昌 330045;
    3. 华东交通大学 信息工程学院, 南昌 330013
  • 收稿日期:2018-03-30 修回日期:2018-06-21 出版日期:2018-10-10 发布日期:2018-10-13
  • 通讯作者: 钱文彬
  • 作者简介:王映龙(1970-),男,江西九江人,教授,博士,主要研究方向:计算智能、知识发现;曾淇(1991-),女,江西赣州人,硕士研究生,主要研究方向:粗糙集、数据挖掘;钱文彬(1984-),男,江西宜春人,副教授,博士,主要研究方向:粒计算、三支决策、知识发现;舒文豪(1985-),女,江西吉安人,讲师,博士,主要研究方向:粒计算、知识发现;黄锦涛(1995-),男,湖南岳阳人,硕士研究生,主要研究方向:粒计算、机器学习。
  • 基金资助:
    国家自然科学基金资助项目(61502213,61662023);江西省自然科学基金资助项目(20161BAB212049)。

Incremental attribute reduction method for incomplete hybrid data with variable precision

WANG Yinglong1, ZENG Qi1, QIAN Wenbin1,2, SHU Wenhao3, HUANG Jintao1   

  1. 1. School of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang Jiangxi 330045, China;
    2. School of Software, Jiangxi Agricultural University, Nanchang Jiangxi 330045, China;
    3. School of Information Engineering, East China Jiaotong University, Nanchang Jiangxi 330013, China
  • Received:2018-03-30 Revised:2018-06-21 Online:2018-10-10 Published:2018-10-13
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61502213, 61662023), the Natural Science Foundation of Jiangxi Province (20161BAB212049).

摘要: 为了解决当不完备混合决策系统中数据动态增加时,静态属性约简方法的计算复杂度高的问题,提出变精度下不完备混合数据的增量式属性约简方法。首先,在变精度模型下给出了利用条件熵度量属性的重要性程度;然后,详细分析和设计了当数据动态增加时条件熵的增量式更新变化情况和属性约简的更新机制;在此基础上,利用启发式贪心策略构造了增量式的属性约简算法,实现了不完备的数值型和符号型混合数据下属性约简的动态更新。通过UCI数据集中五个真实的混合型数据集的实验比较和分析,在约简效果方面,利用增量式属性约简算法处理Echocardiogram、Hepatitis、Autos、Credit和Dermatology数据集的增量规模为90%+10%时,数据集的原属性个数分别由12、19、25、17和34个约简至6、7、10、11和13个,分别占原属性集的50.0%、36.8%、40.0%、64.7%和38.2%;在执行时间方面,增量式算法在五个数据集的平均耗时分别为2.99 s、3.13 s、9.70 s、274.19 s和50.87 s,静态算法的平均耗时分别为284.92 s、302.76 s、1062.23 s、3510.79 s和667.85 s,且增量式算法的耗时与数据集的实例规模、属性个数和属性值类型的分布相关。实验结果表明,增量式属性约简算法在计算耗时方面要显著优于静态算法,且能有效剔除数据中的冗余属性。

关键词: 粗糙集, 属性约简, 邻域关系, 增量式方法, 不完备混合数据

Abstract: In order to deal with the highly computational complexity of static attribute reduction when the data increasing dynamically in incomplete hybrid decision system, an incremental attribute reduction method was proposed for incomplete hybrid data with variable precision. The important degrees of attributes were measured by conditional entropy in the variable precision model. Then the incremental updating of conditional entropy and the updating mechanism of attribute reduction were analyzed and designed in detail when the data is dynamically increased. An incremental attribute reduction method was constructed by heuristic greedy strategy which can achieve the dynamical updating of attribute reduction of incomplete numeric and symbolic hybrid data. Through the experimental comparison and analysis of five real hybrid datasets in UCI, in terms of the reduction effects, when the incremental size of the Echocardiogram, Hepatitis, Autos, Credit and Dermatology increased to 90%+10%, the original number of attributes is reduced from 12, 19, 25, 17, 34 to 6, 7, 10, 11, 13, which is accounted for 50.0%, 36.8%, 40.0%, 64.7%, 38.2% of the original attribute set; in terms of the execution time, the average time consumed by the incremental algorithm in the five datasets is 2.99, 3.13, 9.70, 274.19, 50.87 seconds, and the average time consumed by the static algorithm is 284.92, 302.76, 1062.23, 3510.79, 667.85 seconds. The time-consuming of the incremental algorithm is related to the distribution of the instance size, the number of attributes, and the attribute value type of the data set. The experimental results show that the incremental attribute reduction algorithm is significantly superior to the static algorithm in time-consuming, and can effectively eliminate redundant attributes.

Key words: rough set, attribute reduction, neighborhood relation, incremental method, incomplete hybrid data

中图分类号: