Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (12): 3426-3433.DOI: 10.11772/j.issn.1001-9081.2019049238

• The 17th China Conference on Machine Learning (CCML 2019) • Previous Articles     Next Articles

Multi-scale attribute granule based quick positive region reduction algorithm

CHEN Manru1,2, ZHANG Nan1,2, TONG Xiangrong1,2, DONGYE Shenglong1,2, YANG Wenjing1,2   

  1. 1. Key Lab for Data Science and Intelligence Technology of Shandong Higher Education Institutes(Yantai University), Yantai Shandong 264005, China;
    2. School of Computer Science and Control Engineering, Yantai University, Yantai Shandong 264005, China
  • Received:2019-04-29 Revised:2019-07-03 Online:2019-12-10 Published:2019-08-26
  • Contact: 张楠
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61572418, 61572419, 11801491), the Natural Science Foundation of Shandong Province (ZR2018BA004).

基于多尺度属性粒策略的快速正域约简算法

陈曼如1,2, 张楠1,2, 童向荣1,2, 东野升龙1,2, 杨文静1,2   

  1. 1. 数据科学与智能技术山东省高校重点实验室(烟台大学), 山东 烟台 264005;
    2. 烟台大学 计算机与控制工程学院, 山东 烟台 264005
  • 作者简介:陈曼如(1993-),女,宁夏银川人,硕士研究生,主要研究方向:粗糙集、数据挖掘、机器学习;张楠(1979-),男,山东烟台人,讲师,博士,CCF会员,主要研究方向:粗糙集、粒计算、人工智能;童向荣(1975-),男,山东烟台人,教授,博士,主要研究方向:多智能体系统、分布式人工智能;东野升龙(1996-),男,山东临沂人,主要研究方向:粗糙集、数据挖掘、人工智能;杨文静(1996-),女,山东济宁人,硕士研究生,主要研究方向:粗糙集、数据挖掘、机器学习。
  • 基金资助:
    国家自然科学基金资助项目(61572418,61572419,11801491);山东省自然科学基金资助项目(ZR2018BA004)。

Abstract: In classical heuristic attribute reduction algorithm for positive region, the attribute with the maximum dependency degree of the current positive domain should be added into the selected feature attribute subset in each iteration, leading to the large number of iterations and the low efficiency of the algorithm, and making the algorithm hard to be applied in the feature selection of high-dimensional and large-scale datasets. In order to solve the problems, the monotonic relationship between the positive regions in a decision system was studied and the formal description for the Multi-Scale Attribute Granule (MSAG) was given, and a Multi-scale Attribute Granule based Quick Positive Region reduction algorithm (MAG-QPR) was proposed. Each MSAG contains several attributes and can provide a large positive region for the selected feature attribute subset. As a result, adding MSAG in each iteration can reduce the number of the iteration and make the selected feature attribute subset more quickly approach to the positive region resolving ability of the condition attribute universal set. Therefore, the computational efficiency of the heuristic attribute reduction algorithm for positive region is improved. With 8 UCI datasets used for experiments, on the datasets Lung Cancer, Flag and German, the running time acceleration ratios of MAG-QPR to the general improved Feature Selection algorithm based on the Positive Approximation-Positive Region (FSPA-PR), the general improved Feature Selection algorithm based on the Positive Approximation-Shannon's Conditional Entropy (FSPA-SCE), the Backward Greedy Reduction Algorithm for positive region Preservation (BGRAP) and the Backward Greedy Reduction Algorithm for Generalized decision preservation (BGRAG) are 9.64, 15.70, 5.03, 2.50; 3.93, 7.55, 1.69, 4.57; and 3.61, 6.49, 1.30, 9.51 respectively. The experimental results show that, the proposed algorithm MAG-QPR can improve the algorithm efficiency and has better classification accuracy.

Key words: attribute reduction, rough set, multi-scale attribute granule, positive region reduction, quick reduction algorithm

摘要: 传统启发式正域属性约简算法在每次迭代的过程中需要添加当前正域依赖度最大的属性进入已选定的特征属性子集,算法迭代次数多且效率低,难以应用于高维大规模数据集的特征选择中。针对上述问题,研究决策系统中正域之间的单调关系,给出了多尺度属性粒(MSAG)的形式化描述,提出了一种基于多尺度属性粒的快速正域约简算法(MAG-QPR)。由于多尺度属性粒包含多个属性,可以对已选定的特征属性子集提供较大的正域,因此,通过每次迭代添加MSAG,可以达到减少迭代次数和使选定的特征属性子集能更快地趋近于条件属性全集的正域分辨能力的目的,从而提高了启发式正域约简算法的效率。在实验部分,选取8组UCI数据进行实验,对于数据集Lung Cancer、Flag和German,MAG-QPR与基于正向近似的正域保持属性约简算法(FSPA-PR)、基于正向近似的条件熵属性约简算法(FSPA-SCE)、后向贪婪正域保持属性约简算法(BGRAP)和后向贪婪启发式广义决策保持属性约简算法(BGRAG)的运行时间加速比分别为9.64、15.70、5.03、2.50;3.93、7.55、1.69、4.57;3.61、6.49、1.30、9.51。实验结果表明,所提算法MAG-QPR提高了算法效率,具有更好的分类精度。

关键词: 属性约简, 粗糙集, 多尺度属性粒, 正域约简, 快速约简算法

CLC Number: