《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (5): 1408-1414.DOI: 10.11772/j.issn.1001-9081.2023121829

所属专题: 进化计算专题(2024年第5期“进化计算专题”导读,全文即将上线)

• 进化计算专题 • 上一篇    

进化双层自适应局部特征选择

高麟1, 周宇1(), 邝得互2   

  1. 1.深圳大学 计算机与软件学院,广东 深圳 518060
    2.岭南大学 电脑及决策科学系,香港 999077
  • 收稿日期:2024-01-01 接受日期:2024-01-19 发布日期:2024-04-26 出版日期:2024-05-10
  • 通讯作者: 周宇
  • 作者简介:高麟(1995—),男,湖南常德人,硕士研究生,主要研究方向:特征选择、多目标优化
    邝得互(1959—),男,中国香港人,讲席教授,博士,主要研究方向:进化计算、视频编码、机器学习。
    第一联系人:周宇(1987—),男,陕西西安人,副教授,博士研究生,CCF会员,主要研究方向:计算智能、多媒体信息处理、智慧教育
  • 基金资助:
    国家自然科学基金资助项目(61702336);深圳市科创委基础研究课题(JCYJ20220810112354002)

Evolutionary bi-level adaptive local feature selection

Lin GAO1, Yu ZHOU1(), Tak Wu KWONG2   

  1. 1.College of Computer Science & Software Engineering,Shenzhen University,Shenzhen Guangdong 518060,China
    2.Department of Computing and Decision Sciences,Lingnan University,Hongkong 999077,China
  • Received:2024-01-01 Accepted:2024-01-19 Online:2024-04-26 Published:2024-05-10
  • Contact: Yu ZHOU
  • About author:GAO Lin, born in 1995, M. S. candidate. His research interests include feature selection, multi-objective optimization.
    KWONG Tak Wu, born in 1959, Ph. D., chair professor. His research interests include evolutionary computing, video coding, machine learning.
  • Supported by:
    National Natural Science Foundation of China(61702336);Basic Research Project of Shenzhen Science and Innovation Commission(JCYJ20220810112354002)

摘要:

局部特征选择(LFS)方法将样本空间划分为多个局部区域,并为每个区域选择最优特征子集以反映局部异质信息。然而,现有的LFS方法以每个样本为中心划分局部区域并找到最优特征子集,导致优化效率低下且适用场景受限。为了解决这个问题,提出一种进化双层自适应局部特征选择(BiLFS)算法。LFS问题被建模为双层优化问题,特征子集和待优化局部区域是该问题的两个决策变量。在问题的上层,使用非支配排序遗传算法-Ⅱ求解被选择的局部区域的最优特征子集,区域纯度和被选择特征比率是目标函数;在问题的下层,根据上层求解的最优特征子集,首先使用局部区域聚类分析得到区域内的中心样本,然后通过局部区域融合消除非必要区域并更新必要区域的种群。在11个UCI数据集上的测试结果表明,相较于基于进化算法的非自适应LFS方法,BiLFS的平均分类准确率达到前者的98.48%,而平均所需计算用时仅为前者的9.51%,运算效率得到大幅提升,且达到基于线性规划的LFS方法的水准。对迭代过程中BiLFS算法选择的用于优化的局部区域进行可视化分析,结果表明,BiLFS选择必要局部区域具有稳定性和可靠性。

关键词: 特征选择, 双层优化, 遗传算法, 多目标优化, 聚类

Abstract:

Local Feature Selection (LFS) methods partition the sample space into multiple local regions and select the optimal feature subset for each region to reflect local heterogeneous information. However, the existing LFS methods partition local regions around each sample and find the optimal feature subset, resulting in low optimization efficiency and limited applicability. To address this issue, a new evolutionary Bi-level adaptive Local Feature Selection (BiLFS) algorithm was proposed. The LFS problem was formulated as a bi-level optimization problem, with feature subsets and locally optimized regions as the decision variables. At the upper level, Non-dominated Sorting Genetic Algorithm Ⅱ was employed to find the optimal feature subsets for the selected local regions, with region purity and selected feature ratio as the objective functions. At the lower level, based on the upper-level solution, local region clustering analysis was used to obtain center samples within each region, followed by local region fusion to eliminate unnecessary regions and update the population of necessary regions. Experimental results on 11 UCI datasets demonstrate that BiLFS achieves an average classification accuracy up to 98.48%, and an average computation time down to 9.51% compare to those of non-adaptive LFS methods based on evolutionary algorithms, significantly improving computational efficiency to the level of linear programming-based LFS methods. Visual analysis of the locally optimized regions selected by the BiLFS algorithm during the iteration process indicates the stability and reliability of selecting necessary local regions.

Key words: Feature Selection (FS), bi-level optimization, Genetic Algorithm (GA), multi-objective optimization, clustering

中图分类号: