Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (1): 109-114.DOI: 10.11772/j.issn.1001-9081.2021010128

• Data science and technology • Previous Articles    

Dynamic relevance based feature selection algorithm

Yongbo CHEN, Qiaoqin LI, Yongguo LIU()   

  1. School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu Sichuan 610054,China
  • Received:2021-01-25 Revised:2021-03-29 Accepted:2021-05-17 Online:2021-06-04 Published:2022-01-10
  • Contact: Yongguo LIU
  • About author:CHEN Yongbo, born in 1995, M. S. candidate. His research interests include cloud computing, big data.
    LI Qiaoqin, born in 1972, Ph. D., associate professor. Her research interests include Internet+healthcare, machine learning, Internet of things.
    LIU Yongguo, born in 1974, Ph. D., professor. His research interests include digital healthcare, computing health, artificial intelligence, big data.
  • Supported by:
    National Key Research and Development Program of China(2017YFC1703905);National Natural Science Foundation of China(81803851);Key Research and Development Program of Sichuan Province(2020YFS0372)

基于动态相关性的特征选择算法

陈永波, 李巧勤, 刘勇国()   

  1. 电子科技大学 信息与软件工程学院,成都 610054
  • 通讯作者: 刘勇国
  • 作者简介:陈永波(1995—),男,内蒙古赤峰人,硕士研究生,主要研究方向:云计算、大数据
    李巧勤(1972—),女,重庆人,副教授,博士,主要研究方向:互联网+健康医疗、机器学习、物联网
    刘勇国(1974—),男,四川绵阳人,教授,博士,主要研究方向:数字医疗、计算健康、人工智能、大数据。
  • 基金资助:
    国家重点研发计划项目(2017YFC1703905);国家自然科学基金资助项目(81803851);四川省重点研发计划项目(2020YFS0372)

Abstract:

By removing irrelevant features from the original dataset and selecting good feature subsets, feature selection can avoid the curse of dimensionality and improve the performance of learning algorithm.In the process of feature selection, only the dynamically change information between the selected features and classes is considered, and interaction relevance between the candidate features and the selected features is ignored by Dynamic Change of Selected Feature with the class (DCSF) algorithm. To solve this problem, a Dynamic Relevance based Feature Selection (DRFS) algorithm was proposed. In the proposed algorithm, conditional mutual information was used to measure the conditional relevance between the selected features and classes, and interaction information was used to measure the synergy brought by the candidate features and the selected features, so as to select relevant features and remove redundant features then obtain good feature subsets. Simulation results show that, compared with existing algorithms, the proposed algorithm can effectively improve classification accuracy of feature selection.

Key words: feature selection, information entropy, mutual information, conditional mutual information, interaction information

摘要:

特征选择是从原始数据集中去除无关的特征并选择良好的特征子集,可以避免维数灾难和提高学习算法的性能。为解决已选特征和类别动态变化(DCSF)算法在特征选择过程中只考虑已选特征和类别之间动态变化的信息量,而忽略候选特征和已选特征的交互相关性的问题,提出了一种基于动态相关性的特征选择(DRFS)算法。该算法采用条件互信息度量已选特征和类别的条件相关性,并采用交互信息度量候选特征和已选特征发挥的协同作用,从而选择相关特征并且去除冗余特征以获得优良特征子集。仿真实验表明,与现有算法相比,所提算法能有效地提升特征选择的分类准确率。

关键词: 特征选择, 信息熵, 互信息, 条件互信息, 交互信息

CLC Number: