Journal of Computer Applications

    Next Articles

Fuzzy-rough set based unsupervised dynamic feature selection algorithm

Lei Ma1,Chuan LuoLI Tian-ruiHongmei Chen   

  • Received:2022-10-14 Revised:2022-11-28 Accepted:2022-11-30 Online:2023-04-12 Published:2023-04-12
  • Contact: Chuan Luo

基于模糊粗糙集的无监督动态特征选择算法

马磊1,罗川1,李天瑞2,陈红梅2   

  1. 1.四川大学 计算机学院,成都 610065 2.西南交通大学 计算机与人工智能学院,成都 611756

  • 通讯作者: 罗川
  • 基金资助:
    国家自然科学基金项目;国家自然科学基金项目;国家自然科学基金项目;四川省自然科学基金项目

Abstract: Under the background of big data today, data often arrive in batches. Dynamic feature selection methods can achieve better time efficiencies by using results generated at the previous stages of feature selection. Aiming at the problem that there are few unsupervised dynamic feature selection methods based on fuzzy-rough sets, a novel Unsupervised Dynamic Fuzzy-Rough Feature Selection algorithm (UDFRFS) was introduced. Firstly, the process of updating fuzzy relation value was proceeded on existing data by defining the new triangular norm and similarity relationship to reduce unnecessary calculation. Then, the time cost of the process of feature selection on features that were selected at the previous stages were reduced by using dependencies to judge if the feature should be recalculated to reduce the redundant process of feature selection. Results of experiments show that UDFRFS can achieve a runtime improving more than 90% with good accuracies and cluster performances compared to the static dependency based unsupervised fuzzy-rough set feature selection algorithm.

Key words: feature selection, fuzzy-rough set, dynamic data, unsupervised feature selection, dependency

摘要: 在当今的大数据背景下,数据常常是分批次到达的。动态特征选择算法通过运用之前的运算结果,能够大幅提升处理动态数据的效率。针对目前基于模糊粗糙集的无监督动态特征选择算法相对较少的问题,提出了一种特征分批次到达情况下的基于模糊粗糙集的无监督动态特征选择算法(UDFRFS)。首先,通过定义伪三角范数和新的相似关系,使得模糊关系值得更新过程可以在已有数据的基础上进行,减少不必要的运算过程;其次,通过利用之前已有的特征选择结果,在新的特征到达之后,对于原始特征的部分使用依赖度进行判断,减少冗余的特征选择过程,从而进一步提高特征选择的速度。实验结果表明,UDFRFS相较于静态的基于依赖度的无监督模糊粗糙集特征选择算法,在运行时间的方面能够提升90个百分点以上,同时保持较好的分类精度和聚类表现。

关键词: 特征选择, 模糊粗糙集, 动态数据, 无监督特征选择, 依赖度

CLC Number: