计算机应用 ›› 2020, Vol. 40 ›› Issue (5): 1266-1271.DOI: 10.11772/j.issn.1001-9081.2019091614

• 人工智能 • 上一篇    下一篇

基于新的森林优化算法的特征选择算法

谢琪1, 徐旭2, 程耕国1, 陈和平1   

  1. 1.武汉科技大学 信息科学与工程学院,武汉 430081
    2.格勒诺布尔高等商学院 高等商业学院,格勒诺布尔 38000, 法国
  • 收稿日期:2019-09-23 修回日期:2019-10-16 出版日期:2020-05-10 发布日期:2020-05-15
  • 通讯作者: 谢琪(1984—)
  • 作者简介:谢琪(1984—),男,福建福州人,博士研究生,主要研究方向:机器学习; 徐旭(1995—),男,湖北武汉人,硕士研究生,主要研究方向:机器学习、量化投资; 程耕国(1947—),男,安徽绩溪人,教授,博士生导师,博士,主要研究方向:控制理论及应用; 陈和平(1956—),男,湖北武汉人,教授,博士生导师,CCF会员,主要研究方向:人工智能、大数据。
  • 基金资助:

    国家自然科学基金资助项目(61702381, 61602351)。

Feature selection algorithm based on new forest optimization algorithm

XIE Qi1, XU Xu2, CHENG Gengguo1, CHEN Heping1   

  1. 1.School of Information Science and Engineering, Wuhan University of Science and Technology, WuhanHubei 430081, China
    2.Graduate School of Business, Grenoble Graduate School of Business, Grenoble 38000, France
  • Received:2019-09-23 Revised:2019-10-16 Online:2020-05-10 Published:2020-05-15
  • Contact: XIE Qi,born in 1984,Ph. D. candidate. His research interests include machine learning.
  • About author:XIE Qi,born in 1984,Ph. D. candidate. His research interests include machine learning.XU Xu,born in 1995,M. S. candidate. His research interests include machine learning, quantitative investment.CHENG Gengguo,born in 1947,Ph. D., professor. His research interests include control theory and applications.CHEN Heping,born in 1956,M. S., professor. His research interests include artificial intelligence, big data.
  • Supported by:

    This work is partially supported by the National Natural Science Foundation of China (61702381, 61602351).

摘要:

针对传统的基于森林优化算法的特征选择算法在初始化阶段、候选森林生成阶段和更新阶段存在的问题,提出了一种新的基于森林优化算法的特征选择算法。该算法在初始化阶段采用皮尔森相关系数和L1正则化方法代替随机初始化策略;在候选森林生成阶段,采用优劣树分开和差额补足的方法解决优劣树不完备问题;在更新阶段,将与最优树精度相同但维度不同的树木添加到森林中。在实验中,所提算法采用与传统的基于森林优化算法的特征选择算法相同的实验数据和实验参数,分别测试了小维度、中维度和大维度数据。实验结果表明,在2个大维度数据和2个中维度数据上,所提算法的分类精度和维度缩减能力均高于传统的基于森林优化算法的特征选择算法。实验结果验证了所提算法在处理特征选择问题的有效性。

关键词: 特征选择, L1正则化, 候选森林, 更新机制, 森林优化算法

Abstract:

A new feature selection algorithm using forest optimization algorithm was proposed, which aimed at solving the problems of the traditional feature selection using forest optimization algorithm in the stages of initialization, candidate forest generation and updating. In the algorithm, Pearson correlation coefficient and L1 regularization method were used to replace the random initialization strategy in the initialization stage, the methods of separating good and bad trees and fulfilling the difference were used to solve the problems of incompletion of good and bad trees in the candidate forest generation stage, and trees having the same precision but different dimension with the optimal tree were added to the forest in the updating stage. In the experiments, with the same experimental data and experimental parameters, the proposed algorithm and the traditional feature selection using forest optimization algorithm were used to test the small, medium and large dimension data respectively. The experimental results show that the proposed algorithm is better than the traditional feature selection using forest optimization algorithm in the classification performance and dimension reduction ability on two medium and two large dimension data. The experimental results prove the effectiveness of the proposed algorithm in solving feature selection problems.

Key words: feature selection, L1 regularization, candidate forest, updating mechanism, forest optimization algorithm

中图分类号: