Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (6): 1694-1700.DOI: 10.11772/j.issn.1001-9081.2020091370

Special Issue: 数据科学与技术

• Data science and technology • Previous Articles     Next Articles

Parameter independent weighted local mean-based pseudo nearest neighbor classification algorithm

CAI Ruiguang, ZHANG Desheng, XIAO Yanting   

  1. Faculty of Science, Xi'an University of Technology, Xi'an Shaanxi 710054, China
  • Received:2020-09-07 Revised:2020-10-24 Online:2021-06-10 Published:2020-11-10
  • Supported by:
    This work is partially supported by the Youth Program of National Natural Foundation of China (11801438).

参数独立的加权局部均值伪近邻分类算法

蔡瑞光, 张德生, 肖燕婷   

  1. 西安理工大学 理学院, 西安 710054
  • 通讯作者: 蔡瑞光
  • 作者简介:蔡瑞光(1996-),女,陕西宝鸡人,硕士研究生,主要研究方向:数据挖掘、分类分析;张德生(1964-),男,陕西永寿人,教授,博士,主要研究方向:概率论与数理统计;肖燕婷(1981-),女,陕西户县人,副教授,博士,主要研究方向:概率论与数理统计。
  • 基金资助:
    国家自然科学基金青年科学基金资助项目(11801438)。

Abstract: Aiming at the problem that the Local Mean-based Pseudo Nearest Neighbor (LMPNN) algorithm is sensitive to the value of k and ignores the different influence of different attributes on the classification results, a Parameter Independent Weighted Local Mean-based Pseudo Nearest Neighbor classification (PIW-LMPNN) algorithm was proposed. Firstly, the Success-History based parameter Adaptation for Differential Evolution (SHADE) algorithm, the latest variant of differential evolution algorithm, was used to optimize the training set samples to obtain the best k value and a set of best weights related to the classes. Secondly, when calculating the distance between samples, different weights were assigned to different attributes of different classes, and the test set samples were classified. Finally, simulations were performed on 15 real datasets and the proposed algorithm was compared to other eight classification algorithms. The results show that the proposed algorithm has the classification accuracy and F1 value increased by about 28 percentage points and 23.1 percentage points respectively. At the same time, the comparision results of Wilcoxon signed-rank test, Friedman rank variance test and Hollander-Wolfe's pairwise processing show that the proposed improved algorithm outperforms the other eight classification algorithms in terms of classification accuracy and k value selection.

Key words: Local Mean-based Pseudo Nearest Neighbor (LMPNN) algorithm, feature weighting, optimization model, Success-History based parameter Adaptation for Differential Evolution (SHADE), parameter adaption

摘要: 针对局部均值伪近邻(LMPNN)算法对k值敏感且忽略了每个属性对分类结果的不同影响等问题,提出了一种参数独立的加权局部均值伪近邻分类(PIW-LMPNN)算法。首先,利用差分进化算法的最新变体——基于成功历史记录的自适应参数差分进化(SHADE)算法对训练集样本进行优化,从而得到最佳k值和一组与类别相关的最佳权重;其次,计算样本间的距离时赋予每类的每个属性不同的权重,并对测试集样本进行分类。在15个实际数据集上进行了仿真实验,并把所提算法与其他8种分类算法进行了比较,实验结果表明,所提算法的分类准确率和F1值分别最大提高了约28个百分点和23.1个百分点;同时Wilcoxon符号秩检验、Friedman秩方差检验以及Hollander-Wolfe两处理的比较结果表明,所提出的改进算法在分类精度以及k值选择方面相较其他8种分类算法具有明显优势。

关键词: 局部均值伪近邻算法, 特征权重, 优化模型, 基于成功历史记录的自适应参数差分进化, 参数自适应

CLC Number: