计算机应用 ›› 2013, Vol. 33 ›› Issue (02): 558-562.DOI: 10.3724/SP.J.1087.2013.00558

• 数据库技术 • 上一篇    下一篇

平均互信息和类别区分性修剪规则的KNN算法

周靖   

  1. 广东石油化工学院 计算机与电子信息学院,广东 茂名 525000
  • 收稿日期:2012-08-06 修回日期:2012-09-19 出版日期:2013-02-01 发布日期:2013-02-25
  • 通讯作者: 周靖
  • 作者简介:周靖(1980-),女,广东茂名人,实验师,硕士,主要研究方向:人工智能、数据挖掘。

KNN algorithm of average mutual information and class distinction pruning rules

ZHOU Jing   

  1. College of Computer and Electronic Information, Guangdong University of Petrochemical Technology, Maoming Guangdong 525000, China
  • Received:2012-08-06 Revised:2012-09-19 Online:2013-02-01 Published:2013-02-25
  • Contact: ZHOU Jing

摘要: 大规模的样本数量及其特征高维性影响着K最近邻(KNN)分类算法的分类性能。为此,提出一种具备降维、修剪机制的特征参数平均互信息和类别区分性的KNN改进算法AMI&CD-KNN。首先使用熵中平均互信息的概念,衡量特征参数体现类别特征信息的准确程度;然后采用特征参数相对类别的优势率及其在数据集中的分布概率描述类别区分性,用于体现特征参数提供类别信息量的大小;最后建立特征参数平均互信息和类别区分性的内在联系,设计样本修剪方法,从而达到在保证分类准确性的前提下,提高分类速度的目的。理论分析与仿真实验表明,与经典KNN及其他具备修剪机制的算法比较,提出的算法具有更高的分类泛化性。

关键词: 平均互信息, 类别区分性, 修剪规则, K最近邻算法, 类别特征

Abstract: Large sample size and characteristics of high dimension affects the K-Nearest Neighbor (KNN) classification algorithm in classification performance. Therefore, an improved KNN method named AMI&CD-KNN was put forward, which had feature dimension reduction and pruning mechanism based on average mutual information and class distinction of the feature parameters. Firstly, it used the concept of average mutual information in entropy to measure the accuracy of the feature parameters reflecting the class characteristic information. Secondly, it described the class distinction through the feature parameters odd ratio relative class and its distribution probability in the dataset, measured the size of the amount of class information provided by feature parameters. Finally, the relation was established between average mutual information and class distinction, and the sample pruning method was designed. Therefore, the classification was speeded up while ensuring the classification accuracy. The theoretical analysis and simulation experiment show that compared with the KNN and other algorithms with pruning mechanism, the improved algorithm has better classification generalization.

Key words: average mutual information, class distinction, pruning rule, K-Nearest Neighbor (KNN) algorithm, class feature

中图分类号: