Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 518-523.DOI: 10.11772/j.issn.1001-9081.2019091642

• CCF Bigdata 2019 • Previous Articles     Next Articles

Distributed rough set attribute reduction algorithm under Spark

Xiajie ZHANG1, Jinghua ZHU1,2(), Yang CHEN1   

  1. 1.School of Computer Science and Technology,Heilongjiang University,Harbin Heilongjiang 150080,China
    2.Key Laboratory of Database and Parallel Computing of Heilongjiang Province,Harbin Heilongjiang 150080,China
  • Received:2019-08-30 Revised:2019-09-26 Accepted:2019-10-18 Online:2019-10-31 Published:2020-02-10
  • Contact: Jinghua ZHU
  • About author:ZHANG Xiajie, born in 1995, M. S. candidate. His research interests include data mining, rough set theory.
    CHEN yang, born in 1996, M. S. candidate. His research interests include data mining, discretization.
  • Supported by:
    the Surface Program of Natural Science Foundation of Heilongjiang Province(F2018028)


章夏杰1, 朱敬华1,2(), 陈杨1   

  1. 1.黑龙江大学 计算机科学技术学院,哈尔滨 150080
    2.黑龙江省数据库与并行计算重点实验室,哈尔滨 150080
  • 通讯作者: 朱敬华
  • 作者简介:章夏杰(1995—),男,浙江温州人,硕士研究生,主要研究方向:数据挖掘、粗糙集理论
  • 基金资助:


Attribute reduction (feature selection) is an important part of data preprocessing. Most of attribute reduction methods use attribute dependence as the criterion for filtering attribute subsets. A Fast Dependence Calculation (FDC) method was designed to calculate the dependence by directly searching for the objects based on relative positive domains. It is not necessary to find the relative positive domain in advance, so that the method has a significant performance improvement in speed compared with the traditional methods. In addition, the Whale Optimization Algorithm (WOA) was improved to make the calculation method effective for rough set attribute reduction. Combining the above two methods, a distributed rough set attribute reduction algorithm based on Spark named SP-WOFRST was proposed, which was compared with a Spark-based rough set attribute reduction algorithm named SP-RST on two synthetical large data sets. Experimental results show that the proposed SP-WOFRST algorithm is superior to SP-RST in accuracy and speed.

Key words: rough set, Apache Spark, Whale Optimization Algorithm (WOA), feature selection, attribute reduction



关键词: 粗糙集, Apache Spark, 鲸鱼优化算法, 特征选择, 属性约简

CLC Number: