Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (4): 1035-1041.DOI: 10.11772/j.issn.1001-9081.2024030366
• Artificial intelligence • Previous Articles Next Articles
Junyi ZHU1,2, Leilei CHANG1,2, Xiaobin XU1,2(), Zhiyong HAO3,4, Haiyue YU4, Jiang JIANG4
Received:
2024-04-02
Revised:
2024-06-20
Accepted:
2024-06-21
Online:
2024-10-11
Published:
2025-04-10
Contact:
Xiaobin XU
About author:
ZHU Junyi, born in 2000, M. S. candidate. His research interests include machine learning, data processing.Supported by:
朱俊屹1,2, 常雷雷1,2, 徐晓滨1,2(), 郝智勇3,4, 于海跃4, 姜江4
通讯作者:
徐晓滨
作者简介:
朱俊屹(2000—),男,浙江温州人,硕士研究生,主要研究方向:机器学习、数据处理基金资助:
CLC Number:
Junyi ZHU, Leilei CHANG, Xiaobin XU, Zhiyong HAO, Haiyue YU, Jiang JIANG. Self-supervised learning method using minimal prior knowledge[J]. Journal of Computer Applications, 2025, 45(4): 1035-1041.
朱俊屹, 常雷雷, 徐晓滨, 郝智勇, 于海跃, 姜江. 基于最小先验知识的自监督学习方法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1035-1041.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024030366
方法 | 建模样本种类 | 样本标签要求 | 结果 |
---|---|---|---|
有监督学习 | 有标签样本 | 样本标签需要准确反映特征 | 附带标签 |
无监督学习 | 无标签样本 | 不需要标签 | 无标签 |
自监督学习 | 无标签样本或只有少量有标签样本 | 样本标签需符合前置任务/锚标签需准确 | 附带标签 |
本文方法 | 无标签样本或只有少量有标签样本 | 需样本标签分布/锚标签需准确 | 附带标签 |
Tab. 1 Comparison of sample requirements and sample characteristics for different methods
方法 | 建模样本种类 | 样本标签要求 | 结果 |
---|---|---|---|
有监督学习 | 有标签样本 | 样本标签需要准确反映特征 | 附带标签 |
无监督学习 | 无标签样本 | 不需要标签 | 无标签 |
自监督学习 | 无标签样本或只有少量有标签样本 | 样本标签需符合前置任务/锚标签需准确 | 附带标签 |
本文方法 | 无标签样本或只有少量有标签样本 | 需样本标签分布/锚标签需准确 | 附带标签 |
类型 | 编号 | 数据集 | 标签0 | 标签1 |
---|---|---|---|---|
标签不平衡 数据 | 1 | Appendicitis | 21 | 85 |
2 | Transfusion | 178 | 570 | |
3 | Immunotherapy | 19 | 71 | |
4 | Record | 70 | 229 | |
5 | Sonar-72 | 21 | 51 | |
标签平衡 数据 | 6 | heart_stalog | 120 | 150 |
7 | Somerville Happiness Survey | 72 | 71 | |
8 | credit-a | 357 | 296 | |
9 | waveform_1_2 | 1 653 | 1 655 | |
10 | Caesarian | 34 | 46 |
Tab. 2 Information of UCI datasets
类型 | 编号 | 数据集 | 标签0 | 标签1 |
---|---|---|---|---|
标签不平衡 数据 | 1 | Appendicitis | 21 | 85 |
2 | Transfusion | 178 | 570 | |
3 | Immunotherapy | 19 | 71 | |
4 | Record | 70 | 229 | |
5 | Sonar-72 | 21 | 51 | |
标签平衡 数据 | 6 | heart_stalog | 120 | 150 |
7 | Somerville Happiness Survey | 72 | 71 | |
8 | credit-a | 357 | 296 | |
9 | waveform_1_2 | 1 653 | 1 655 | |
10 | Caesarian | 34 | 46 |
方法 | Appendicitis | Transfusion | Immunotherapy | Record | Sonar-72 | |
---|---|---|---|---|---|---|
无监督聚类方法 | K-means | 79.25 | 72.33 | 72.22 | 59.20 | 77.78 |
DBSCAN | 71.69 | 70.86 | 68.89 | 62.54 | 75.00 | |
Hierarchical | 68.87 | 75.80 | 76.67 | 65.89 | 76.17 | |
SOM | 73.58 | 65.11 | 70.00 | 51.84 | 77.78 | |
有监督分类方法 | C4.5 | 84.94 | 72.71 | 74.59 | 74.25 | 67.97 |
KNN | 86.02 | 74.65 | 74.45 | 77.19 | 80.89 | |
BPNN | 88.64 | 75.87 | 89.81 | 62.89 | 88.89 | |
SVM | 88.18 | 76.69 | 75.87 | 68.82 | 82.91 | |
本文方法 | 聚类初始化 | 83.84 | 77.78 | 79.63 | 68.63 | 80.46 |
距离初始化 | 83.58 | 76.54 | 76.17 | 76.80 | 79.16 |
Tab. 3 Accuracy comparison of different methods on UCI datasets with imbalanced labels
方法 | Appendicitis | Transfusion | Immunotherapy | Record | Sonar-72 | |
---|---|---|---|---|---|---|
无监督聚类方法 | K-means | 79.25 | 72.33 | 72.22 | 59.20 | 77.78 |
DBSCAN | 71.69 | 70.86 | 68.89 | 62.54 | 75.00 | |
Hierarchical | 68.87 | 75.80 | 76.67 | 65.89 | 76.17 | |
SOM | 73.58 | 65.11 | 70.00 | 51.84 | 77.78 | |
有监督分类方法 | C4.5 | 84.94 | 72.71 | 74.59 | 74.25 | 67.97 |
KNN | 86.02 | 74.65 | 74.45 | 77.19 | 80.89 | |
BPNN | 88.64 | 75.87 | 89.81 | 62.89 | 88.89 | |
SVM | 88.18 | 76.69 | 75.87 | 68.82 | 82.91 | |
本文方法 | 聚类初始化 | 83.84 | 77.78 | 79.63 | 68.63 | 80.46 |
距离初始化 | 83.58 | 76.54 | 76.17 | 76.80 | 79.16 |
方法 | heart_stalog | Somerville Happiness Survey | credit-a | waveform_1_2 | Caesarian |
---|---|---|---|---|---|
C4.5 | 81.11 | 64.43 | 83.20 | 76.50 | 52.92 |
KNN | 85.60 | 56.78 | 69.30 | 80.90 | 55.63 |
BPNN | 86.80 | 55.43 | 65.06 | 92.12 | 41.46 |
SVM | 83.75 | 59.61 | 84.90 | 84.30 | 59.91 |
本文方法 | 78.66 | 66.46 | 75.00 | 89.73 | 59.71 |
Tab. 4 Accuracy comparison of proposed method and supervised classification methods on UCI datasets with balanced labels
方法 | heart_stalog | Somerville Happiness Survey | credit-a | waveform_1_2 | Caesarian |
---|---|---|---|---|---|
C4.5 | 81.11 | 64.43 | 83.20 | 76.50 | 52.92 |
KNN | 85.60 | 56.78 | 69.30 | 80.90 | 55.63 |
BPNN | 86.80 | 55.43 | 65.06 | 92.12 | 41.46 |
SVM | 83.75 | 59.61 | 84.90 | 84.30 | 59.91 |
本文方法 | 78.66 | 66.46 | 75.00 | 89.73 | 59.71 |
方法 | Appendicitis | Transfusion | Immunotherapy | Record | Sonar-72 | |||||
---|---|---|---|---|---|---|---|---|---|---|
最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | |
TabNet | 86.36 | 79.70 | 78.07 | 74.76 | 83.33 | 74.07 | 78.26 | 73.00 | 80.00 | 68.00 |
SubTab | 86.36 | 79.39 | 77.33 | 73.51 | 83.33 | 72.59 | 81.67 | 76.00 | 86.11 | 75.96 |
本文方法 | 87.74 | 78.07 | 82.22 | 78.26 | 86.11 | |||||
方法 | heart_stalog | Somerville Happiness Survey | credit-a | waveform_1_2 | Caesarian | |||||
最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | |
TabNet | 66.67 | 55.14 | 68.97 | 55.17 | 66.41 | 56.03 | 94.86 | 68.75 | 55.42 | |
SubTab | 81.25 | 72.50 | 68.97 | 59.77 | 80.92 | 72.88 | 93.66 | 92.63 | 68.75 | |
本文方法 | 85.42 | 71.13 | 87.44 | 94.90 | 89.73 | 63.75 | 59.71 |
Tab. 5 Accuracy comparison of proposed method and mainstream self-supervised learning methods
方法 | Appendicitis | Transfusion | Immunotherapy | Record | Sonar-72 | |||||
---|---|---|---|---|---|---|---|---|---|---|
最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | |
TabNet | 86.36 | 79.70 | 78.07 | 74.76 | 83.33 | 74.07 | 78.26 | 73.00 | 80.00 | 68.00 |
SubTab | 86.36 | 79.39 | 77.33 | 73.51 | 83.33 | 72.59 | 81.67 | 76.00 | 86.11 | 75.96 |
本文方法 | 87.74 | 78.07 | 82.22 | 78.26 | 86.11 | |||||
方法 | heart_stalog | Somerville Happiness Survey | credit-a | waveform_1_2 | Caesarian | |||||
最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | 最大值 | 平均值 | |
TabNet | 66.67 | 55.14 | 68.97 | 55.17 | 66.41 | 56.03 | 94.86 | 68.75 | 55.42 | |
SubTab | 81.25 | 72.50 | 68.97 | 59.77 | 80.92 | 72.88 | 93.66 | 92.63 | 68.75 | |
本文方法 | 85.42 | 71.13 | 87.44 | 94.90 | 89.73 | 63.75 | 59.71 |
1 | 甘海涛. 半监督聚类与分类算法研究[D]. 武汉:华中科技大学, 2014: 5-7. |
GAN H T. Research on semi-supervised clustering and classification algorithm [D]. Wuhan: Huazhong University of Science and Technology, 2014: 5-7. | |
2 | 王卫东. 基于自监督学习和深度关系网络的SAR图像变化检测[D]. 西安: 西安电子科技大学, 2021:8-10. |
WANG W D. SAR image change detection based on self-supervised learning and deep relation network [D]. Xi’an: Xidian University, 2021: 8-10. | |
3 | 彭超. 基于自监督学习和迁移学习的CT图像肺结节分类研究[D]. 重庆:重庆大学, 2021:14-15. |
PENG C. Research on lung nodule classification in CT image based on self-supervised learning and transfer learning [D]. Chongqing: Chongqing University, 2021:14-15. | |
4 | 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. |
ZHOU Z H. Machine learning [M]. Beijing: Tsinghua University Press, 2016. | |
5 | LIU X, ZHANG F, HOU Z, et al. Self-supervised learning: generative or contrastive [J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(1): 857-876. |
6 | JAISWAL A, BABU A R, ZADEH M Z, et al. A survey on contrastive self-supervised learning [J]. Technologies, 2020, 9(1): No.2. |
7 | 张春昊, 解滨,张喜梅,等. 一种结合自适应近邻与密度峰值的加权模糊聚类算法[J]. 小型微型计算机系统, 2023, 44(9): 1974-1982. |
ZHANG C H, XIE B, ZHANG X M, et al. Weighted fuzzy clustering algorithm combining adaptive nearest neighbors and density peaks [J]. Journal of Chinese Computer Systems, 2023, 44(9): 1974-1982. | |
8 | NOROOZI M, VINJIMOOR A, FAVARO P, et al. Boosting self-supervised learning via knowledge transfer [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 9359-9367. |
9 | WU L, LIN H, TAN C, et al. Self-supervised learning on graphs: contrastive, generative, or predictive [J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(4): 4216-4235. |
10 | JI J, WANG J, HUANG C, et al. Spatio-temporal self-supervised learning for traffic flow prediction [C]// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2023: 4356-4364. |
11 | RANI V, NABI S T, KUMAR M, et al. Self-supervised learning: a succinct review [J]. Archives of Computational Methods in Engineering, 2023, 30(4): 2761-2775. |
12 | DENIZE J, RABARISOA J, ORCESI A, et al. Similarity contrastive estimation for self-supervised soft contrastive learning[C]// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2023: 2705-2715. |
13 | SHWARTZ ZIV R, LeCUN Y. To compress or not to compress self-supervised learning and information theory: a review [J]. Entropy, 2024, 26(3): No.252. |
14 | 代雨柔. 基于自监督学习的用户轨迹分析[D]. 成都:电子科技大学, 2022: 20. |
DAI Y R. Human trajectory analysis based on self-supervised learning [D]. Chengdu: University of Electronic Science and Technology of China, 2022: 20. | |
15 | ARIK S Ö, PFISTER T. TabNet: attentive interpretable tabular learning [C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 6679-6687. |
16 | UÇAR T, HAJIRAMEZANALI E, EDWARDS L. SubTab: subsetting features of tabular data for self-supervised representation learning [C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2021: 18853-18865. |
17 | DUA D, GRAFF C. The UCI machine learning repository [DB/OL]. [2023-08-13]. . |
18 | HU H, LIU J, ZHANG X, et al. An effective and adaptable K-means algorithm for big data cluster analysis [J]. Pattern Recognition, 2023, 139: No.109404. |
19 | DENG D. DBSCAN clustering algorithm based on density [C]// Proceedings of the 7th International Forum on Electrical Engineering and Automation. Piscataway: IEEE, 2020: 949-953. |
20 | LI W, WANG Z, SUN W, et al. An ensemble clustering framework based on hierarchical clustering ensemble selection and clusters clustering [J]. Cybernetics and Systems, 2023, 54(5): 741-766. |
21 | ZHAO Z, RUI Z, DUAN X. Feature selection for binary classification based on class labeling, SOM, and hierarchical clustering [J]. Measurement and Control, 2023, 56(9/10): 1649-1669. |
22 | SINAGA T H, WANTO A, GUNAWAN I, et al. Implementation of data mining using C4.5 algorithm on customer satisfaction in Tirta Lihou PDAM [J]. Journal of Computer Networks, Architecture and High Performance Computing, 2021, 3(1): 9-20. |
23 | UDDIN S, HAQUE I, LU H, et al. Comparative performance analysis of K-Nearest Neighbour (KNN) algorithm and its different variants for disease prediction [J]. Scientific Reports, 2022, 12: No.6256. |
24 | AL-JARRAH R, AL-OQLA F M. A novel integrated BPNN/SNN artificial neural network for predicting the mechanical performance of green fibers for better composite manufacturing [J]. Composite Structures, 2022, 289: No.115475. |
25 | WANG H, LI G, WANG Z. Fast SVM classifier for large-scale classification problems [J]. Information Sciences, 2023, 642: No.119136. |
[1] | Guangju YANG, Tianjian LUO, Kaijun WANG, Siqi YANG. Multi-branch multi-view based contextual contrastive representation learning method for time series [J]. Journal of Computer Applications, 2025, 45(4): 1042-1052. |
[2] | Zirong HONG, Guangqing BAO. Review of radar automatic target recognition based on ensemble learning [J]. Journal of Computer Applications, 2025, 45(2): 371-382. |
[3] | Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718. |
[4] | Zihao YAO, Yuanming LI, Ziqiang MA, Yang LI, Lianggen WEI. Multi-object cache side-channel attack detection model based on machine learning [J]. Journal of Computer Applications, 2024, 44(6): 1862-1871. |
[5] | Xuebin CHEN, Zhiqiang REN, Hongyang ZHANG. Review on security threats and defense measures in federated learning [J]. Journal of Computer Applications, 2024, 44(6): 1663-1672. |
[6] | Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL: positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492. |
[7] | Guijin HAN, Xinyuan ZHANG, Wentao ZHANG, Ya HUANG. Self-supervised image registration algorithm based on multi-feature fusion [J]. Journal of Computer Applications, 2024, 44(5): 1597-1604. |
[8] | Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276. |
[9] | Yi ZHENG, Cunyi LIAO, Tianqian ZHANG, Ji WANG, Shouyin LIU. Image denoising-based cell-level RSRP estimation method for urban areas [J]. Journal of Computer Applications, 2024, 44(3): 855-862. |
[10] | Wei SHE, Yang LI, Lihong ZHONG, Defeng KONG, Zhao TIAN. Hyperparameter optimization for neural network based on improved real coding genetic algorithm [J]. Journal of Computer Applications, 2024, 44(3): 671-676. |
[11] | Xuebin CHEN, Changsheng QU. Overview of backdoor attacks and defense in federated learning [J]. Journal of Computer Applications, 2024, 44(11): 3459-3469. |
[12] | Renke SUN, Zhiyu HUANGFU, Hu CHEN, Zhongnian LI, Xinzheng XU. Survey of neural architecture search [J]. Journal of Computer Applications, 2024, 44(10): 2983-2994. |
[13] | Wenze CHAI, Jing FAN, Shukui SUN, Yiming LIANG, Jingfeng LIU. Overview of deep metric learning [J]. Journal of Computer Applications, 2024, 44(10): 2995-3010. |
[14] | Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering [J]. Journal of Computer Applications, 2024, 44(10): 3011-3020. |
[15] | Yuning ZHANG, Abudukelimu ABULIZI, Tisheng MEI, Chun XU, Maierdana MAIMAITIREYIMU, Halidanmu ABUDUKELIMU, Yutao HOU. Anomaly detection method for skeletal X-ray images based on self-supervised feature extraction [J]. Journal of Computer Applications, 2024, 44(1): 175-181. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||