《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 688-694.DOI: 10.11772/j.issn.1001-9081.2021040789

• 2021年中国计算机学会人工智能会议(CCFAI 2021) • 上一篇    

基于ReliefF的层次分类在线流特征选择算法

张小清1,2, 王晨曦1,2(), 吕彦1,2, 林耀进1,2   

  1. 1.闽南师范大学 计算机学院,福建 漳州 363000
    2.数据科学与智能应用福建省高校重点实验室,福建 漳州 363000
  • 收稿日期:2021-05-17 修回日期:2021-07-11 接受日期:2021-07-14 发布日期:2021-11-09 出版日期:2022-03-10
  • 通讯作者: 王晨曦
  • 作者简介:张小清(1998—),女,福建泉州人,硕士研究生,CCF会员,主要研究方向:数据挖掘
    吕彦(1997—),女,安徽宣城人,硕士研究生,CCF会员,主要研究方向:数据挖掘
    林耀进(1980—),男,福建漳州人,教授,博士,CCF会员,主要研究方向:数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(62076116);福建省自然科学基金资助项目(2020J01811)

Hierarchical classification online streaming feature selection algorithm based on ReliefF algorithm

Xiaoqing ZHANG1,2, Chenxi WANG1,2(), Yan LYU1,2, Yaojin LIN1,2   

  1. 1.College of Computer Science,Minnan Normal University,Zhangzhou Fujian 363000,China
    2.Key Laboratory of Data Science and Intelligence Application,Fujian Province University,Zhangzhou Fujian 363000,China
  • Received:2021-05-17 Revised:2021-07-11 Accepted:2021-07-14 Online:2021-11-09 Published:2022-03-10
  • Contact: Chenxi WANG
  • About author:ZHANG Xiaoqing, born in 1998, M. S. candidate. Her research interests include data mining.
    LYU Yan, born in 1997, M. S. candidate. Her research interests include data mining.
    LIN Yaojin, born in 1980, Ph. D., professor. His research interests include data mining.
  • Supported by:
    National Natural Science Foundation of China(62076116);Natural Science Foundation of Fujian Province(2020J01811)

摘要:

在图像标注、疾病诊断等实际分类任务中,数据标记空间的类别通常存在着层次化结构关系,且伴随着特征的高维性。许多层次特征选择算法因不同的实际任务需求而提出,但这些已有的特征选择算法忽略了特征空间的未知性和不确定性。针对上述问题,提出一种基于ReliefF的面向层次分类学习的在线流特征选择算法OH_ReliefF。首先将类别之间的层次关系融入ReliefF算法中,定义一种新的面向层次化数据的特征权重计算算法HF_ReliefF;其次,利用特征对决策属性的划分能力动态选择重要特征;最后,基于特征之间的独立性对特征进行动态冗余分析。实验结果表明,与五种先进的在线流特征选择算法作对比,OH_ReliefF算法在K最邻近(KNN)分类器和拉格朗日支持向量机(LSVM)分类器的各个评价指标中都取得较优的结果,准确率最少提高7个百分点。

关键词: 特征选择, 在线流特征选择, 层次分类, ReliefF算法, 兄弟策略

Abstract:

In practical classification tasks such as image annotation and disease diagnosis, there is usually a hierarchical structural relationship between the classes in the label space of data with high dimensionality of the features. Many hierarchical feature selection algorithms have been proposed for different practical tasks, but ignoring the unknown and uncertainty of feature space. In order to solve the above problems, an online streaming feature selection algorithm OH_ReliefF based on ReliefF for hierarchical classification learning was presented. Firstly, the hierarchical relationship between classes was incorporated into the ReliefF algorithm to define a new method HF_ReliefF for calculating feature weights for hierarchical data. Then, important features were dynamically selected based on the ability of features to classify decision attributes. Finally, the dynamic redundancy analysis of features was performed based on the independence between features. Experimental results show that the proposed algorithm achieves better results in all evaluation metrics of the K-Nearest Neighbor (KNN) classifier and the Lagrangian Support Vector Machine (LSVM) classifier at least 7 percentage points improvement in accuracy when compared with five advanced online streaming feature selection algorithms.

Key words: feature selection, online streaming feature selection, hierarchical classification, ReliefF algorithm, sibling strategy

中图分类号: