《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (3): 683-687.DOI: 10.11772/j.issn.1001-9081.2021040760

• 2021年中国计算机学会人工智能会议(CCFAI 2021) • 上一篇    

双重特征加权模糊支持向量机

邱云志, 汪廷华(), 戴小路   

  1. 赣南师范大学 数学与计算机科学学院,江西 赣州 341000
  • 收稿日期:2021-05-12 修回日期:2021-06-09 接受日期:2021-06-22 发布日期:2021-11-09 出版日期:2022-03-10
  • 通讯作者: 汪廷华
  • 作者简介:邱云志(1995—),男,安徽芜湖人,硕士研究生,主要研究方向:统计机器学习、大数据分析
    戴小路(1997—),女,江西景德镇人,硕士研究生,主要研究方向:统计机器学习、大数据分析。
  • 基金资助:
    国家自然科学基金资助项目(61966002)

Doubly feature-weighted fuzzy support vector machine

Yunzhi QIU, Tinghua WANG(), Xiaolu DAI   

  1. School of Mathematics and Computer Science,Gannan Normal University,Ganzhou Jiangxi 341000,China
  • Received:2021-05-12 Revised:2021-06-09 Accepted:2021-06-22 Online:2021-11-09 Published:2022-03-10
  • Contact: Tinghua WANG
  • About author:QIU Yunzhi, born in 1995, M. S. candidate. His research interests include statistical machine learning, big data analysis.
    DAI Xiaolu, born in 1997, M. S. candidate. Her research interests include statistical machine learning, big data analysis.
  • Supported by:
    National Natural Science Foundation of China(61966002)

摘要:

针对当前基于特征加权的模糊支持向量机(FSVM)只考虑特征权重对隶属度函数的影响,而没有考虑在样本训练过程中将特征权重应用到核函数计算中的缺陷,提出了同时考虑特征加权对隶属度函数和核函数计算的影响的模糊支持向量机算法——双重特征加权模糊支持向量机(DFW-FSVM)。首先,利用信息增益(IG)计算出每个特征的权重;然后,在原始空间中基于特征权重计算出样本到类中心的加权欧氏距离,进而应用该加权欧氏距离构造隶属度函数,并在样本训练过程中将特征权重应用到核函数的计算中;最后,根据加权的隶属度函数和核函数构造出DFW-FSVM算法。该方法避免了在计算过程中被弱相关或不相关的特征所支配。在8个UCI数据集上进行对比实验,结果显示DFW-FSVM算法的准确率和F1值较5个对比算法(SVM、FSVM、特征加权SVM(FWSVM)、特征加权FSVM(FWFSVM)、基于中心核对齐的FSVM(CKA-FSVM))中的最好结果分别提升了2.33和5.07个百分点,具有较好的分类性能。

关键词: 模糊支持向量机, 特征加权, 信息增益, 核函数, 隶属度函数

Abstract:

Concerning the shortcoming that the current feature-weighted Fuzzy Support Vector Machines (FSVM) only consider the influence of feature weights on the membership functions but ignore the application of feature weights to the kernel functions calculation during sample training, a new FSVM algorithm that considers the influence of feature weights on the membership function and the kernel function calculation simultaneously was proposed, namely Doubly Feature-Weighted FSVM (DFW-FSVM). Firstly, relative weight of each feature was calculated by using Information Gain (IG). Secondly, the weighted Euclidean distance between the sample and the class center was calculated in the original space based on the feature weights, and then the membership function was constructed by applying the weighted Euclidean distance; at the same time, the feature weights were applied to the calculation of the kernel function in the sample training process. Finally, DFW-FSVM algorithm was constructed according to the weighted membership functions and kernel functions. In this way, DFW-FSVM is able to avoid being dominated by trivial relevant or irrelevant features. The comparative experiments were carried out on eight UCI datasets, and the results show that compared with the best results of SVM, FSVM, Feature-Weighted SVM (FWSVM), Feature-Weighted FSVM (FWFSVM) and FSVM based on Centered Kernel Alignment (CKA-FSVM) , the accuracy and F1 value of the DFW-FSVM algorithm increase by 2.33 and 5.07 percentage points, respectively, indicating that the proposed DFW-FSVM has good classification performance.

Key words: Fuzzy Support Vector Machine (FSVM), feature-weighted, Information Gain (IG), kernel function, membership function

中图分类号: