《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (5): 1355-1366.DOI: 10.11772/j.issn.1001-9081.2021030497

• 人工智能 • 上一篇    下一篇

基于邻域粗糙集和帝王蝶优化的特征选择算法

孙林1,2(), 赵婧1, 徐久成1,2, 王欣雅1   

  1. 1.河南师范大学 计算机与信息工程学院, 河南 新乡 453007
    2.教育人工智能与个性化学习河南省重点实验室(河南师范大学), 河南 新乡 453007
  • 收稿日期:2021-04-02 修回日期:2021-09-15 接受日期:2021-09-22 发布日期:2022-06-11 出版日期:2022-05-10
  • 通讯作者: 孙林
  • 作者简介:孙林(1979—),男,河南南阳人,副教授,博士,CCF会员,主要研究方向:粒计算、数据挖掘、机器学习、生物信息学 sunlin@htu.edu.cn
    赵婧(1996—),女,河南洛阳人,硕士研究生,主要研究方向:数据挖掘、机器学习
    徐久成(1963—),男,河南洛阳人,教授,博士生导师,博士,CCF高级会员,主要研究方向:粒计算、数据挖掘、机器学习
    王欣雅(1997—),女,河南平顶山人,硕士研究生,主要研究方向:数据挖掘、机器学习。
  • 基金资助:
    国家自然科学基金资助项目(62076089);河南省科技攻关项目(212102210136)

Feature selection algorithm based on neighborhood rough set and monarch butterfly optimization

Lin SUN1,2(), Jing ZHAO1, Jiucheng XU1,2, Xinya WANG1   

  1. 1.College of Computer and Information Engineering,Henan Normal University,Xinxiang Henan 453007,China
    2.Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province (Henan Normal University),Xinxiang Henan 453007,China
  • Received:2021-04-02 Revised:2021-09-15 Accepted:2021-09-22 Online:2022-06-11 Published:2022-05-10
  • Contact: Lin SUN
  • About author:SUN Lin, born in 1979,Ph. D.,associate professor. His researchinterests include granular computing,data mining,machine learning,bioinformatics.
    ZHAO Jing,born in 1996,M. S. candidate. Her research interestsinclude data mining,machine learning.
    XU Jiucheng, born in 1963,Ph. D.,professor. His researchinterests include granular computing,data mining,machine learning.
    WANG Xinya, born in 1997,M. S. candidate. Her researchinterests include data mining,machine learning.
  • Supported by:
    National Natural Science Foundation of China(62076089);Key Scientific and Technological Project of Henan Province(212102210136)

摘要:

针对经典的帝王蝶优化(MBO)算法不能很好地处理连续型数据,以及粗糙集模型对于大规模、高维复杂的数据处理能力不足等问题,提出了基于邻域粗糙集(NRS)和MBO的特征选择算法。首先,将局部扰动和群体划分策略与MBO算法结合,并构建传输机制以形成一种二进制MBO(BMBO)算法;其次,引入突变算子增强算法的探索能力,设计了基于突变算子的BMBO(BMBOM)算法;然后,基于NRS的邻域度构造适应度函数,并对初始化的特征子集的适应度值进行评估并排序;最后,使用BMBOM算法通过不断迭代搜索出最优特征子集,并设计了一种元启发式特征选择算法。在基准函数上评估BMBOM算法的优化性能,并在UCI数据集上评价所提出的特征选择算法的分类能力。实验结果表明,在5个基准函数上,BMBOM算法的最优值、最差值、平均值以及标准差明显优于MBO和粒子群优化(PSO)算法;在UCI数据集上,与基于粗糙集的优化特征选择算法、结合粗糙集与优化算法的特征选择算法、结合NRS与优化算法的特征选择算法、基于二进制灰狼优化的特征选择算法相比,所提特征选择算法在分类精度、所选特征数和适应度值这3个指标上表现良好,能够选择特征数少且分类精度高的最优特征子集。

关键词: 帝王蝶优化, 特征选择, 邻域粗糙集, 邻域依赖度, 二进制

Abstract:

The classical Monarch Butterfly Optimization (MBO) algorithm cannot handle continuous data well, and the rough set model cannot sufficiently process large-scale, high-dimensional and complex data. To address these problems, a new feature selection algorithm based on Neighborhood Rough Set (NRS) and MBO was proposed. Firstly, local disturbance, group division strategy and MBO algorithm were combined, and a transmission mechanism was constructed to form a Binary MBO (BMBO) algorithm. Secondly, the mutation operator was introduced to enhance the exploration ability of this algorithm, and a BMBO based on Mutation operator (BMBOM) algorithm was proposed. Then, a fitness function was developed based on the neighborhood dependence degree in NRS, and the fitness values of the initialized feature subsets were evaluated and sorted. Finally, the BMBOM algorithm was used to search the optimal feature subset through continuous iterations, and a meta-heuristic feature selection algorithm was designed. The optimization performance of the BMBOM algorithm was evaluated on benchmark functions, and the classification performance of the proposed feature selection algorithm was evaluated on UCI datasets. Experimental results show that, the proposed BMBOM algorithm is significantly better than MBO and Particle Swarm Optimization (PSO) algorithms in terms of the optimal value, worst value, average value and standard deviation on five benchmark functions. Compared with the optimized feature selection algorithms based on rough set, the feature selection algorithms combining rough set and optimization algorithms, the feature selection algorithms combining NRS and optimization algorithms, the feature selection algorithms based on binary grey wolf optimization, the proposed feature selection algorithm performs well in the three indicators of classification accuracy, the number of selected features and fitness value on UCI datasets, and can select the optimal feature subset with few features and high classification accuracy.

Key words: Monarch Butterfly Optimization (MBO), feature selection, Neighborhood Rough Set (NRS), neighborhood dependence degree, binary

中图分类号: