《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2673-2678.DOI: 10.11772/j.issn.1001-9081.2022091376

• 2022第10届CCF大数据学术会议 • 上一篇    下一篇

有序规范实数对多相似度K最近邻分类算法

崔昊阳1, 张晖2(), 周雷2, 杨春明1, 李波1, 赵旭剑1   

  1. 1.西南科技大学 计算机科学与技术学院,四川 绵阳 621010
    2.西南科技大学 数理学院,四川 绵阳 621010
  • 收稿日期:2022-09-06 修回日期:2022-09-26 接受日期:2022-10-08 发布日期:2022-11-01 出版日期:2023-09-10
  • 通讯作者: 张晖
  • 作者简介:崔昊阳(1996—),男,山西长治人,硕士研究生,CCF会员,主要研究方向:机器学习、数据挖掘
    周雷(1981—),男,四川眉山人,讲师,博士,主要研究方向:模糊数学、量子计算与量子信息
    杨春明(1980—),男,云南华坪人,副教授,硕士,CCF会员,主要研究方向:数据挖掘、自然语言处理、大数据
    李波(1977—),男,四川绵阳人,讲师,硕士,CCF会员,主要研究方向:信息安全、信息过滤
    赵旭剑(1984—),男,四川绵阳人,副教授,博士,CCF会员,主要研究方向:机器学习、自然语言处理。
  • 基金资助:
    四川省科技厅重点研发项目(2021YFG0031);四川省省级科研院所科技成果转化项目(2022JDZH0035)

Multi-similarity K-nearest neighbor classification algorithm with ordered pairs of normalized real numbers

Haoyang CUI1, Hui ZHANG2(), Lei ZHOU2, Chunming YANG1, Bo LI1, Xujian ZHAO1   

  1. 1.School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang Sichuan 621010,China
    2.School of Mathematics and Physics,Southwest University of Science and Technology,Mianyang Sichuan 621010,China
  • Received:2022-09-06 Revised:2022-09-26 Accepted:2022-10-08 Online:2022-11-01 Published:2023-09-10
  • Contact: Hui ZHANG
  • About author:CUI Haoyang,born in 1996, M. S. candidate. His research interests include machine learning, data mining.
    ZHOU Lei, born in 1981, Ph. D., lecturer. His research interests include fuzzy mathematics, quantum computation and quantum information.
    YANG Chunming, born in 1980, M. S., associate professor. His research interests include data mining, natural language processing, big data.
    LI Bo, born in 1977, M. S., lecturer. His research interests include information security, information filtering.
    ZHAO Xujian, born in 1984, Ph. D., associate professor. His research interests include machine learning, natural language processing.
  • Supported by:
    Key Research and Development Project of Science and Technology Department of Sichuan Province(2021YFG0031);Provincial Scientific Research Institutes’ Achievement Transformation Project of Science and Technology Department of Sichuan Province(2022JDZH0035)

摘要:

针对最近邻分类算法性能受到所采用的相似度或距离度量方法影响大,且难以选择最优的相似度或距离度量方法的问题,提出一种采用多相似度的基于有序规范实数对的K最近邻分类算法(OPNs-KNN)。首先,在机器学习领域中引入有序规范实数对(OPN)这一新的数学理论,利用多种相似度或距离度量方法将训练集和测试集中所有样本全部转换为OPN,使每个OPN均包含不同的相似度信息;然后再通过改进的最近邻算法对OPN进行分类,实现不同相似度或距离度量方法的结合与互补,从而提高分类性能。实验结果表明,在Iris、seeds等数据集上与距离加权K近邻规则(WKNN)等6种最近邻分类的改进算法相比,OPNs-KNN的分类准确率提高了0.29~15.28个百分点,验证了所提算法能大幅提升分类的性能。

关键词: 机器学习, 最近邻算法, 多相似度, 分类算法, 有序规范实数对

Abstract:

For the problems that the performance of the nearest neighbor classification algorithm is greatly affected by the adopted similarity or distance measuring method, and it is difficult to select the optimal similarity or distance measuring method, with multi-similarity method adopted, a K-Nearest Neighbor algorithm with Ordered Pairs of Normalized real numbers (OPNs-KNN) was proposed. Firstly, the new mathematical theory of Ordered Pair of Normalized real numbers (OPN) was introduced in machine learning. And all the samples in the training and test sets were converted into OPNs by multiple similarity or distance measuring methods, so that different similarity information was included in each OPN. Then, the improved nearest neighbor algorithm was used to classify the OPNs, so that different similarity or distance measuring methods were able to be mixed and complemented to improve the classification performance. Experimental results show that compared with 6 improved nearest neighbor classification algorithms, such as distance-Weighted K-Nearest-Neighbor rule (WKNN) rule on Iris, seeds, and other datasets, OPNs-KNN has the classification accuracy improved by 0.29 to 15.28 percentage points, which proves that the performance of classification can be improved greatly by the proposed algorithm.

Key words: machine learning, nearest neighbor algorithm, multi-similarity, classification algorithm, Ordered Pair of Normalized real numbers (OPN)

中图分类号: