Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (9): 2673-2678.DOI: 10.11772/j.issn.1001-9081.2022091376
• 2022 10th CCF Conference on Big Data • Previous Articles Next Articles
Haoyang CUI1, Hui ZHANG2(), Lei ZHOU2, Chunming YANG1, Bo LI1, Xujian ZHAO1
Received:
2022-09-06
Revised:
2022-09-26
Accepted:
2022-10-08
Online:
2022-11-01
Published:
2023-09-10
Contact:
Hui ZHANG
About author:
CUI Haoyang,born in 1996, M. S. candidate. His research interests include machine learning, data mining.Supported by:
崔昊阳1, 张晖2(), 周雷2, 杨春明1, 李波1, 赵旭剑1
通讯作者:
张晖
作者简介:
崔昊阳(1996—),男,山西长治人,硕士研究生,CCF会员,主要研究方向:机器学习、数据挖掘基金资助:
CLC Number:
Haoyang CUI, Hui ZHANG, Lei ZHOU, Chunming YANG, Bo LI, Xujian ZHAO. Multi-similarity K-nearest neighbor classification algorithm with ordered pairs of normalized real numbers[J]. Journal of Computer Applications, 2023, 43(9): 2673-2678.
崔昊阳, 张晖, 周雷, 杨春明, 李波, 赵旭剑. 有序规范实数对多相似度K最近邻分类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2673-2678.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022091376
数据集 | 数据量 | 维度数 | 类别数 | 数据集 | 数据量 | 维度数 | 类别数 |
---|---|---|---|---|---|---|---|
Iris | 150 | 4 | 3 | wdbc | 569 | 30 | 2 |
seeds | 210 | 7 | 3 | wine | 178 | 13 | 3 |
sonar | 208 | 60 | 2 | sobar72 | 72 | 19 | 2 |
tae | 151 | 5 | 3 | glass | 214 | 9 | 6 |
Tab. 1 UCI datasets
数据集 | 数据量 | 维度数 | 类别数 | 数据集 | 数据量 | 维度数 | 类别数 |
---|---|---|---|---|---|---|---|
Iris | 150 | 4 | 3 | wdbc | 569 | 30 | 2 |
seeds | 210 | 7 | 3 | wine | 178 | 13 | 3 |
sonar | 208 | 60 | 2 | sobar72 | 72 | 19 | 2 |
tae | 151 | 5 | 3 | glass | 214 | 9 | 6 |
数据集 | k | 准确率/% | 召回率/% | F1值/% |
---|---|---|---|---|
glass | 1 | 97.99 | 97.01 | 96.28 |
Iris | 12 | 97.60 | 97.60 | 97.60 |
seeds | 4 | 93.81 | 93.81 | 93.85 |
sobar72 | 5 | 90.28 | 87.54 | 88.07 |
sonar | 1 | 83.51 | 83.18 | 83.33 |
tae | 1 | 59.80 | 59.82 | 59.79 |
wdbc | 3 | 96.84 | 96.16 | 96.58 |
wine | 1 | 94.44 | 94.94 | 94.40 |
Tab. 2 Performance of OPNs-KNN on each dataset under default parameters and corresponding k value
数据集 | k | 准确率/% | 召回率/% | F1值/% |
---|---|---|---|---|
glass | 1 | 97.99 | 97.01 | 96.28 |
Iris | 12 | 97.60 | 97.60 | 97.60 |
seeds | 4 | 93.81 | 93.81 | 93.85 |
sobar72 | 5 | 90.28 | 87.54 | 88.07 |
sonar | 1 | 83.51 | 83.18 | 83.33 |
tae | 1 | 59.80 | 59.82 | 59.79 |
wdbc | 3 | 96.84 | 96.16 | 96.58 |
wine | 1 | 94.44 | 94.94 | 94.40 |
数据集 | 相似度度量1 | 相似度度量2 | q | k |
---|---|---|---|---|
glass | 曼哈顿距离 | 闵氏距离(p=5) | 11.3 | 1 |
Iris | 欧氏距离 | 闵氏距离(p=3) | 10.5 | 1 |
seeds | 余弦相似度 | 闵氏距离(p=10) | 18.9 | 4 |
sobar72 | 欧氏距离 | 闵氏距离(p=3) | 18.6 | 1 |
sonar | 闵氏距离(p=5) | 闵氏距离(p=7) | 1.7 | 1 |
tae | 闵氏距离(p=8) | 闵氏距离(p=9) | 0.4 | 1 |
wdbc | 余弦相似度 | 闵氏距离(p=4) | 0.1 | 3 |
wine | 余弦相似度 | 欧氏距离 | 0.1 | 1 |
Tab. 3 Parameter setting for OPNs-KNN with best performance
数据集 | 相似度度量1 | 相似度度量2 | q | k |
---|---|---|---|---|
glass | 曼哈顿距离 | 闵氏距离(p=5) | 11.3 | 1 |
Iris | 欧氏距离 | 闵氏距离(p=3) | 10.5 | 1 |
seeds | 余弦相似度 | 闵氏距离(p=10) | 18.9 | 4 |
sobar72 | 欧氏距离 | 闵氏距离(p=3) | 18.6 | 1 |
sonar | 闵氏距离(p=5) | 闵氏距离(p=7) | 1.7 | 1 |
tae | 闵氏距离(p=8) | 闵氏距离(p=9) | 0.4 | 1 |
wdbc | 余弦相似度 | 闵氏距离(p=4) | 0.1 | 3 |
wine | 余弦相似度 | 欧氏距离 | 0.1 | 1 |
数据集 | 准确率 | 召回率 | F1值 |
---|---|---|---|
glass | 98.74 | 96.63 | 97.25 |
Iris | 98.87 | 98.87 | 98.87 |
seeds | 95.29 | 95.29 | 95.32 |
sobar72 | 92.78 | 90.70 | 91.16 |
sonar | 88.94 | 88.57 | 88.79 |
tae | 68.87 | 68.83 | 68.82 |
wdbc | 97.07 | 96.43 | 96.83 |
wine | 94.44 | 94.94 | 94.40 |
Tab. 4 Optimal performance of OPNs-KNN on each dataset
数据集 | 准确率 | 召回率 | F1值 |
---|---|---|---|
glass | 98.74 | 96.63 | 97.25 |
Iris | 98.87 | 98.87 | 98.87 |
seeds | 95.29 | 95.29 | 95.32 |
sobar72 | 92.78 | 90.70 | 91.16 |
sonar | 88.94 | 88.57 | 88.79 |
tae | 68.87 | 68.83 | 68.82 |
wdbc | 97.07 | 96.43 | 96.83 |
wine | 94.44 | 94.94 | 94.40 |
数据集 | WKNN | LMPNN | LMKNN | MLMNN | WRKNN | WLMRKNN | OPNs-KNN |
---|---|---|---|---|---|---|---|
Iris | 96.21 | 94.21 | 93.68 | 93.49 | 96.37 | 96.59 | 98.87 |
wine | 82.88 | 86.10 | 86.19 | 87.20 | 92.03 | 92.54 | 94.44 |
sonar | 80.51 | 82.32 | 81.59 | 83.99 | 84.28 | 85.07 | 88.94 |
wdbc | 94.41 | 93.11 | 92.89 | 94.07 | 95.71 | 93.24 | 97.07 |
seeds | 91.16 | 91.30 | 92.25 | 91.09 | 95.00 | 94.42 | 95.29 |
tae | 53.59 | 58.71 | 57.98 | 57.47 | 57.76 | 58.76 | 68.87 |
Tab. 5 Comparison of classification accuracy of different algorithms on each dataset
数据集 | WKNN | LMPNN | LMKNN | MLMNN | WRKNN | WLMRKNN | OPNs-KNN |
---|---|---|---|---|---|---|---|
Iris | 96.21 | 94.21 | 93.68 | 93.49 | 96.37 | 96.59 | 98.87 |
wine | 82.88 | 86.10 | 86.19 | 87.20 | 92.03 | 92.54 | 94.44 |
sonar | 80.51 | 82.32 | 81.59 | 83.99 | 84.28 | 85.07 | 88.94 |
wdbc | 94.41 | 93.11 | 92.89 | 94.07 | 95.71 | 93.24 | 97.07 |
seeds | 91.16 | 91.30 | 92.25 | 91.09 | 95.00 | 94.42 | 95.29 |
tae | 53.59 | 58.71 | 57.98 | 57.47 | 57.76 | 58.76 | 68.87 |
1 | REN L, MA Y Z, SHI H T, et al. Overview of machine learning algorithms[M]// WANG Y, FU M X, XU L X, et al. Signal and Information Processing, Networking and Computers: Proceedings of the 6th International Conference on Signal and Information Processing, Networking and Computers, LNEE 628. Singapore: Springer, 2020: 672-678. 10.1007/978-981-15-4163-6_80 |
2 | BOATENG E Y, OTOO J, ABAYE D A. Basic tenets of classification algorithms K-nearest-neighbor, support vector machine, random forest and neural network: a review[J]. Journal of Data Analysis and Information Processing, 2020, 8(4): 341-357. 10.4236/jdaip.2020.84020 |
3 | MALVIYA A. Machine learning: an overview of classification techniques[M]// GIRI V K, VERMA N K, PATEL R K, et al. Computing Algorithms with Applications in Engineering: Proceedings of ICCAEEE 2019, AIS. Singapore: Springer, 2020: 389-401. 10.1007/978-981-15-2369-4_33 |
4 | ALFEILAT H A ABU, HASSANAT A B A, LASASSMEH O, et al. Effects of distance measure choice on k-nearest neighbor classifier performance: a review[J]. Big Data, 2019, 7(4): 221-248. 10.1089/big.2018.0175 |
5 | ZHOU L. Ordered pair of normalized real numbers[J]. Information Sciences, 2020, 538: 290-313. 10.1016/j.ins.2020.05.036 |
6 | DUDANI S A. The distance-weighted k-nearest-neighbor rule[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1976, SMC-6(4): 325-327. 10.1109/tsmc.1976.5408784 |
7 | GOU J P, QIU W M, YI Z, et al. Locality constrained representation-based K-nearest neighbor classification[J]. Knowledge-Based Systems, 2019, 167: 38-52. 10.1016/j.knosys.2019.01.016 |
8 | 周芸,杜景林,陶晔. 基于属性加权k最近邻算法的降雨预测[J]. 计 算机工程与设计, 2020, 41(6):1605-1609. |
ZHOU Y, DU J L, TAO Y. Method for precipitation forecast based on improved k-nearest neighbor algorithm[J]. Computer Engineering and Design, 2020, 41(6): 1605-1609. | |
9 | 刘晋胜. 基于熵降噪优化相似性距离的KNN算法研究[J]. 计算机应用与软件, 2015, 32(9): 254-256, 285. 10.3969/j.issn.1000-386x.2015.09.061 |
LIU J S. On KNN algorithm based on optimising similarity distance with entropy noise reduction[J]. Computer Applications and Software, 2015, 32(9): 254-256, 285. 10.3969/j.issn.1000-386x.2015.09.061 | |
10 | 肖辉辉,段艳明. 基于属性值相关距离的KNN算法的改进研究[J]. 计算机科学, 2013, 40(11A):157-159, 187. 10.3969/j.issn.1002-137X.2013.z2.040 |
XIAO H H, DUAN Y M. Improved the KNN algorithm based on related to the distance of attribute value[J]. Computer Science, 2013, 40(11A): 157-159, 187. 10.3969/j.issn.1002-137X.2013.z2.040 | |
11 | TOMAŠEV N, MLADENIĆ D. Hubness-aware shared neighbor distances for high-dimensional k-nearest neighbor classification[J]. Knowledge and Information Systems, 2014, 39(1): 89-122. 10.1007/s10115-012-0607-5 |
12 | ZHONG X F, GUO S Z, GAO L, et al. An improved k-NN classification with dynamic k[C]// Proceedings of the 9th International Conference on Machine Learning and Computing. New York: ACM, 2017: 211-216. 10.1145/3055635.3056604 |
13 | 谢妙,林泳昌,朱晓姝. 一种基于信息熵的自适应k值KNN二分类方法[J]. 合肥工业大学学报(自然科学版), 2021, 44(11):1483-1486, 1505. 10.1002/clc.23746 |
XIE M, LIN Y C, ZHU X S. An adaptive k-value KNN binary classification method based on information entropy[J]. Journal of Hefei University of Technology (Natural Science Edition), 2021, 44(11): 1483-1486, 1505. 10.1002/clc.23746 | |
14 | MITANI Y, HAMAMOTO Y. A local mean-based nonparametric classifier[J]. Pattern Recognition Letters, 2006, 27(10): 1151-1159. 10.1016/j.patrec.2005.12.016 |
15 | GOU J P, ZHAN Y Z, RAO Y B, et al. Improved pseudo nearest neighbor classification[J]. Knowledge-Based Systems, 2014, 70: 361-375. 10.1016/j.knosys.2014.07.020 |
16 | GOU J P, QIU W M, MAO Q R, et al. A multi-local means based nearest neighbor classifier[C]// Proceedings of the IEEE 29th International Conference on Tools with Artificial Intelligence. Piscataway: IEEE, 2017:448-452. 10.1109/ictai.2017.00075 |
17 | WISWEDEL B, HÖPPNER F, BERTHOLD M R. Learning in parallel universes[J]. Data Mining and Knowledge Discovery, 2010, 21(1): 130-152. 10.1007/s10618-010-0170-1 |
18 | 赵存秀. 基于混淆矩阵的分类器性能评价指标比较[J]. 电子技术与软件工程, 2020(13):146-147. |
ZHAO C X. Comparison of classifier performance evaluation indexes based on confusion matrix[J]. Electronic Technology and Software Engineering, 2020(13): 146-147. | |
19 | 王钰,赵晓艳,杨杏丽,等. 基于K折交叉验证Beta分布的AUC度量的置信区间[J]. 系统科学与数学, 2020, 40(9):1564-1577. 10.12341/jssms13965 |
WANG Y, ZHAO X Y, YANG X L, et al. Confidence interval of AUC measurement based on K-fold cross validation beta distribution[J]. Journal of Systems Science and Mathematical Sciences, 2020, 40(9): 1564-1577. 10.12341/jssms13965 | |
20 | 蔡瑞光,张德生,肖燕婷. 参数独立的加权局部均值伪近邻分类算法[J]. 计算机应用, 2021, 41(6):1694-1700. 10.11772/j.issn.1001-9081.2020091370 |
CAI R G, ZHANG D S, XIAO Y T. Parameter independent weighted local mean-based pseudo nearest neighbor classification algorithm[J]. Journal of Computer Applications, 2021, 41(6): 1694-1700. 10.11772/j.issn.1001-9081.2020091370 |
[1] | Xuebin CHEN, Zhiqiang REN, Hongyang ZHANG. Review on security threats and defense measures in federated learning [J]. Journal of Computer Applications, 2024, 44(6): 1663-1672. |
[2] | Zihao YAO, Yuanming LI, Ziqiang MA, Yang LI, Lianggen WEI. Multi-object cache side-channel attack detection model based on machine learning [J]. Journal of Computer Applications, 2024, 44(6): 1862-1871. |
[3] | Wei SHE, Yang LI, Lihong ZHONG, Defeng KONG, Zhao TIAN. Hyperparameter optimization for neural network based on improved real coding genetic algorithm [J]. Journal of Computer Applications, 2024, 44(3): 671-676. |
[4] | Yi ZHENG, Cunyi LIAO, Tianqian ZHANG, Ji WANG, Shouyin LIU. Image denoising-based cell-level RSRP estimation method for urban areas [J]. Journal of Computer Applications, 2024, 44(3): 855-862. |
[5] | Xuebin CHEN, Changsheng QU. Overview of backdoor attacks and defense in federated learning [J]. Journal of Computer Applications, 2024, 44(11): 3459-3469. |
[6] | Renke SUN, Zhiyu HUANGFU, Hu CHEN, Zhongnian LI, Xinzheng XU. Survey of neural architecture search [J]. Journal of Computer Applications, 2024, 44(10): 2983-2994. |
[7] | Wenze CHAI, Jing FAN, Shukui SUN, Yiming LIANG, Jingfeng LIU. Overview of deep metric learning [J]. Journal of Computer Applications, 2024, 44(10): 2995-3010. |
[8] | Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering [J]. Journal of Computer Applications, 2024, 44(10): 3011-3020. |
[9] | Jing ZHONG, Chen LIN, Zhiwei SHENG, Shibin ZHANG. Quantum K-Means algorithm based on Hamming distance [J]. Journal of Computer Applications, 2023, 43(8): 2493-2498. |
[10] | Mengjie LAN, Jianping CAI, Lan SUN. Self-regularization optimization methods for Non-IID data in federated learning [J]. Journal of Computer Applications, 2023, 43(7): 2073-2081. |
[11] | Xiaohui HUANG, Kaiming YANG, Jiahao LING. Order dispatching by multi-agent reinforcement learning based on shared attention [J]. Journal of Computer Applications, 2023, 43(5): 1620-1624. |
[12] | Shaochen HAO, Zizuan WEI, Yao MA, Dan YU, Yongle CHEN. Network intrusion detection model based on efficient federated learning algorithm [J]. Journal of Computer Applications, 2023, 43(4): 1169-1175. |
[13] | Xiaofei SUN, Jingyuan ZHU, Bin CHEN, Hengzhi YOU. Virtual screening of drug synthesis reaction based on multimodal data fusion [J]. Journal of Computer Applications, 2023, 43(2): 622-629. |
[14] | Junpeng ZHANG, Yujie SHI, Rui JANG, Jingjing DONG, Changjian QIU. Review on advances in recognition and classification of cognitive impairment based on EEG signals [J]. Journal of Computer Applications, 2023, 43(10): 3297-3308. |
[15] | Hongliang LI, Nong ZHANG, Ting SUN, Xiang LI. Performance interference analysis and prediction for distributed machine learning jobs [J]. Journal of Computer Applications, 2022, 42(6): 1649-1655. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||