Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3121-3128.DOI: 10.11772/j.issn.1001-9081.2022101543

• Data science and technology • Previous Articles    

Fuzzy-rough set based unsupervised dynamic feature selection algorithm

Lei MA1, Chuan LUO1(), Tianrui LI2, Hongmei CHEN2   

  1. 1.College of Computer Science,Sichuan University,Chengdu Sichuan 610065,China
    2.School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China
  • Received:2022-10-17 Revised:2022-11-28 Accepted:2022-11-30 Online:2023-10-07 Published:2023-10-10
  • Contact: Chuan LUO
  • About author:MA Lei, born in 1997, M. S. candidate. His research interests include feature selection.
    LI Tianrui, born in 1969, Ph. D., professor. His research interests include data mining, granular computing.
    CHEN Hongmei, born in 1971, Ph. D., professor. Her research interests include data mining, granular computing.
  • Supported by:
    National Natural Science Foundation of China(62076171);Natural Science Foundation of Sichuan Province(2022NSFSC0898)

基于模糊粗糙集的无监督动态特征选择算法

马磊1, 罗川1(), 李天瑞2, 陈红梅2   

  1. 1.四川大学 计算机学院,成都 610065
    2.西南交通大学 计算机与人工智能学院,成都 611756
  • 通讯作者: 罗川
  • 作者简介:马磊(1997—),男(回族),四川成都人,硕士研究生,主要研究方向:特征选择
    李天瑞(1969—),男,福建莆田人,教授,博士,CCF会员,主要研究方向:数据挖掘、粒计算
    陈红梅(1971—),女,四川成都人,教授,博士,CCF会员,主要研究方向:数据挖掘、粒计算。
  • 基金资助:
    国家自然科学基金资助项目(62076171);四川省自然科学基金资助项目(2022NSFSC0898)

Abstract:

Dynamic feature selection algorithms can improve the time efficiency of processing dynamic data. Aiming at the problem that there are few unsupervised dynamic feature selection algorithms based on fuzzy-rough sets, an Unsupervised Dynamic Fuzzy-Rough set based Feature Selection (UDFRFS) algorithm was proposed under the condition of features arriving in batches. First, by defining a pseudo triangular norm and new similarity relationship, the process of updating fuzzy relation value was performed on the basis of existing data to reduce unnecessary calculation. Then, by utilizing the existing feature selection results, dependencies were adopted to judge if the original feature part would be recalculated to reduce the redundant process of feature selection, and the feature selection was further speeded up. Experimental results show that compared to the static dependency-based unsupervised fuzzy-rough set feature selection algorithm, UDFRFS can achieve the time efficiency improvement of more than 90 percentage points with good classification accuracy and clustering performance.

Key words: feature selection, fuzzy-rough set, dynamic data, unsupervised feature selection, dependency

摘要:

动态特征选择算法能够大幅提升处理动态数据的效率,然而目前基于模糊粗糙集的无监督的动态特征选择算法较少。针对上述问题,提出一种特征分批次到达情况下的基于模糊粗糙集的无监督动态特征选择(UDFRFS)算法。首先,通过定义伪三角范数和新的相似关系在已有数据的基础上进行模糊关系值的更新过程,从而减少不必要的运算过程;其次,通过利用已有的特征选择结果,在新的特征到达后,使用依赖度判断原始特征部分是否需要重新计算,以减少冗余的特征选择过程,从而进一步提高特征选择的速度。实验结果表明,UDFRFS相较于静态的基于依赖度的无监督模糊粗糙集特征选择算法,在时间效率方面能够提升90个百分点以上,同时保持较好的分类精度和聚类表现。

关键词: 特征选择, 模糊粗糙集, 动态数据, 无监督特征选择, 依赖度

CLC Number: