《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (2): 449-456.DOI: 10.11772/j.issn.1001-9081.2021071170

• 数据科学与技术 • 上一篇    

基于局部条件区分能力的高效属性约简算法

康猛, 蒙祖强()   

  1. 广西大学 计算机与电子信息学院,广西 南宁 530004
  • 收稿日期:2021-07-07 修回日期:2021-08-06 接受日期:2021-08-09 发布日期:2021-11-02 出版日期:2022-02-10
  • 通讯作者: 蒙祖强
  • 作者简介:康猛(1995—),男,安徽亳州人,硕士研究生,主要研究方向:粒计算、数据挖掘、知识发现;
    蒙祖强(1974—),男,广西河池人,教授,博士,CCF高级会员,主要研究方向:跨模态智能、粒计算、数据挖掘、知识发现。
  • 基金资助:
    国家自然科学基金资助项目(61762009)

Efficient attribute reduction algorithm based on local conditional discernibility

Meng KANG, Zuqiang MENG()   

  1. School of Computer,Electronics and Information,Guangxi University,Nanning Guangxi 530004,China
  • Received:2021-07-07 Revised:2021-08-06 Accepted:2021-08-09 Online:2021-11-02 Published:2022-02-10
  • Contact: Zuqiang MENG
  • About author:KANG Meng, born in 1995, M. S. candidate. His research interests include granular computing, data mining, knowledge discovery.
    First author contact:MENG Zuqaing, born in 1974, Ph. D., professor. His research interests include cross-modal intelligence, granular computing, data mining, knowledge discovery.
  • Supported by:
    National Natural Science Foundation of China(61762009)

摘要:

基于区分矩阵的传统属性约简方法具有直观易理解的优点,但时间和空间复杂度都很高,当数据规模较大或条件属性较多时,会无法快速得到约简结果。为解决该问题,在区分关系的基础上构造了条件区分能力来进行属性选择,提出一种基于条件区分能力的属性约简算法。而为了进一步加快属性重要性的计算、提高约简效率,依据大数定律中频率的稳定性,通过采样的方式将条件区分能力扩展为局部条件区分能力,提出基于局部条件区分能力的属性约简算法。理论证明了条件区分能力在属性的选择上比正区域更严格,并将该算法与基于区分度的高效前向属性约简算法(FAR-DV)、基于k近邻属性重要度和相关系数的属性约简算法(K2NCRS)及基于正区域排序升序决策表的快速正区域约简算法(FPRA)进行了对比。实验结果显示,该算法在属性选择顺序、约简率和分类精度上与FAR-DV基本一致,在约简效率上比上述三种算法提高了10倍以上;且随着数据规模的增大或条件属性的增多,在约简效率上的提升越明显。可以看出,所提算法具有更低的时空复杂度,适用于海量数据属性约简。

关键词: 属性约简, 区分矩阵, 区分能力, 条件区分能力, 大数定律, 局部条件区分能力

Abstract:

The traditional attribute reduction method based on discernibility matrix is intuitive and easy to understand. However, its time and space complexities are high, so when dealing with large scale data or many conditional attributes, it will not be able to get the reduction result quickly. In order to solve the problem, the conditional discernibility was constructed based on the discernibility relation for attribute selection, and an attribute reduction algorithm based on conditional discernibility was proposed. In order to further accelerate the calculation of attribute importance and improve the efficiency of attribute reduction, according to the stability of frequency in the law of large numbers, the conditional discernibility was extended to local conditional discernibility by sampling, and an attribute reduction algorithm based on local conditional discernibility was proposed. It was theoretically proved that the conditional discernibility was stricter than the positive region in attribute selection. And this algorithm was compared with efficient Forward Attribute Reduction algorithm from Discernibility View (FAR-DV), attribute reduction algorithm based on k-Nearest Neighbor attribute importance and Correlation Coefficient (K2NCRS) and Fast Positive Region reduction Algorithm based on positive region sort ascending decision table (FPRA). Experimental results show that the proposed algorithm is similar to FAR-DV in attribute selection order, reduction rate and classification accuracy. Compared with the above three algorithms, the proposed algorithm has the reduction efficiency improved more than ten times. With the increase of data scale or conditional attributes, the reduction efficiency improvement of this algorithm is higher. It can be seen that the proposed algorithm has lower time and space complexities and is suitable for the attribute reduction of massive data.

Key words: attribute reduction, discernibility matrix, discernibility, conditional discernibility, law of large numbers, local conditional discernibility

中图分类号: