计算机应用 ›› 2018, Vol. 38 ›› Issue (1): 97-103.DOI: 10.11772/j.issn.1001-9081.2017061372

• 人工智能 • 上一篇    下一篇

不完备邻域粗糙集的不确定性度量和属性约简

姚晟1,2, 汪杰2, 徐风2, 陈菊2   

  1. 1. 安徽大学 计算智能与信号处理教育部重点实验室, 合肥 230601;
    2. 安徽大学 计算机科学与技术学院, 合肥 230601
  • 收稿日期:2017-06-05 修回日期:2017-08-27 出版日期:2018-01-10 发布日期:2018-01-22
  • 通讯作者: 汪杰
  • 作者简介:姚晟(1979-),女,安徽合肥人,讲师,博士,主要研究方向:粗糙集、粒计算、大数据;汪杰(1993-),男,安徽六安人,硕士研究生,主要研究方向:粗糙集;徐风(1993-),男,安徽六安人,硕士研究生,主要研究方向:粗糙集;陈菊(1993-),女,安徽滁州人,硕士研究生,主要研究方向:粗糙集。
  • 基金资助:
    国家自然科学基金资助项目(61602004,61300057);安徽省自然科学基金资助项目(1508085MF127);安徽省高等学校自然科学研究重点项目(KJ2016A041);安徽大学信息保障技术协同创新中心公开招标课题(ADXXBZ2014-5,ADXXBZ2014-6);安徽大学博士科研启动基金资助项目(J10113190072)。

Uncertainty measurement and attribute reduction in incomplete neighborhood rough set

YAO Sheng1,2, WANG Jie2, XU Feng2, CHEN Ju2   

  1. 1. Key Laboratory of Intelligent Computing & Signal Processing, Ministry of Education, Anhui University, Hefei Anhui 230601, China;
    2. College of Computer Science and Technology, Anhui University, Hefei Anhui 230601, China
  • Received:2017-06-05 Revised:2017-08-27 Online:2018-01-10 Published:2018-01-22
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61602004, 61300057), the Natural Science Foundation of Anhui Province (1508085MF127), the Key Project of Natural Science Research of Anhui Higher Education Institutions (KJ2016A041), the Public Bidding Project of Co-Innovation Center for Information Supply & Assurance Technology (ADXXBZ2014-5, ADXXBZ2014-6), the Doctoral Scientific Research Foundation of Anhui University (J10113190072).

摘要: 针对现有的属性约简算法不适合处理数值型属性和符号型属性共同存在的不完备数据,提出了一种拓展不完备邻域粗糙集模型。首先,通过考虑属性值的概率分布来定义缺失属性值之间的距离,可以度量具有混合属性的不完备数据;其次,定义了邻域混合熵来评价属性约简的质量,分析证明了相关的性质定理,并构造了一种基于邻域混合熵的不完备邻域粗糙集属性约简算法;最后从UCI数据集中选取了7组数据进行实验,并分别与基于依赖度的属性约简(ARD)、基于邻域条件熵的属性约简(ARCE)、基于邻域组合测度的属性约简(ARNCM)算法进行了比较。理论分析和实验结果表明,所提算法约简属性比ARD、ARCE、ARNCM分别减少了约1,7,0个,所提算法的分类精度比ARD、ARCE、ARNCM分别提高了约2.5,2.1,0.8个百分点。所提算法不仅能够获得较少的约简属性,同时具有较高的分类精度。

关键词: 粗糙集, 属性约简, 不完备决策信息系统, 混合属性, 邻域混合熵

Abstract: Focusing on that the existing attribute reduction algorithms are not suitable for dealing with the incomplete data with both numerical attributes and symbolic attributes, an extented incomplete neighborhood rough set model was proposed. Firstly, the distance between the missing attribute values was defined to deal with incomplete data with mixed attributes by considering the probability distribution of the attribute values. Secondly, the concept of neighborhood mixed entropy was defined to evaluate the quality of attribute reduction and the relevant property theorem was proved. An attribute reduction algorithm for incomplete neighborhood rough set based on neighborhood mixed entropy was constructed. Finally, seven sets of data were selected from the UCI dataset for experimentation, and the algorithms was compared with the Attribute Reduction of Dependency (ARD), the Attribute Reduction of neighborhood Conditional Entropy (ARCE) and the Attribute Reduction of Neighborhood Combination Measure (ARNCM) algorithm respectively. The theoretical analysis and the experimental results show that compared to ARD, ARCE, ARNCM algorithms, the proposed algorithm reduces the attributes by about 1, 7, 0 respectively, and improves the classification accuracy by about 2.5 percentage points, 2.1 percentage points, 0.8 percentage points respectively. The proposed algorithm not only has less reducted attributes, but also has higher classification accuracy.

Key words: rough set, attribute reduction, incomplete decision information system, mixed property, neighborhood mixed entropy

中图分类号: