计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2602-2608.DOI: 10.11772/j.issn.1001-9081.2020111883

所属专题: 网络空间安全

• 网络空间安全 • 上一篇    下一篇

基于半监督学习和三支决策的入侵检测模型

张师鹏, 李永忠, 杜祥通   

  1. 江苏科技大学 计算机学院, 江苏 镇江 212100
  • 收稿日期:2020-12-02 修回日期:2021-01-21 出版日期:2021-09-10 发布日期:2021-05-12
  • 通讯作者: 李永忠
  • 作者简介:张师鹏(1994-),男,安徽宿州人,硕士研究生,CCF学生会员,主要研究方向:网络信息安全;李永忠(1961-),男,甘肃兰州人,教授,硕士,CCF会员,主要研究方向:网络信息安全、智能信息处理;杜祥通(1996-),男,江苏徐州人,硕士研究生,主要研究方向:网络信息安全。
  • 基金资助:
    江苏省研究生科研与实践创新计划项目(KYCX20_3163)。

Intrusion detection model based on semi-supervised learning and three-way decision

ZHANG Shipeng, LI Yongzhong, DU Xiangtong   

  1. School of Computer Science and Technology, Jiangsu University of Science and Technology, Zhenjiang Jiangsu 212100, China
  • Received:2020-12-02 Revised:2021-01-21 Online:2021-09-10 Published:2021-05-12
  • Supported by:
    This work is partially supported by the Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX20_3163).

摘要: 针对现有的入侵检测模型在未知攻击上表现不佳,且标注数据极其有限的情况,提出一种基于半监督学习(SSL)和三支决策(3WD)的入侵检测模型——SSL-3WD。SSL-3WD模型通过3WD在信息不足情况下的优秀表现来满足SSL在数据信息的充分冗余性上的假设。首先利用3WD理论对网络行为数据进行分类,而后根据分类结果选择适当的“伪标记”样本组成新的训练集以扩充原有数据集,最后重复分类过程,以得到所有对于网络行为数据的分类。在NSL-KDD数据集上,所提模型的检出率达到了97.7%,相较于对比方法中检出率最高的自适应的集成学习入侵检测模型Multi-Tree,提升了5.8个百分点;在UNSW-NB15数据集上,所提模型的准确率达到了94.7%,检出率达到了96.3%,相较于对比方法中表现最好的基于深度堆叠非对称自编码器(SNDAE)的入侵检测模型,分别提升了3.5个百分点和6.2个百分点。实验结果表明,所提SSL-3WD模型提升了对网络行为进行检测的准确率和检出率。

关键词: 入侵检测, 半监督学习, 三支决策, 未知攻击, 充分冗余

Abstract: Aiming at the situation that the existing intrusion detection models perform poorly on unknown attacks and have extremely limited labeled data, an intrusion detection model named SSL-3WD based on Semi-Supervised Learning (SSL) and Three-Way Decision (3WD) was proposed. In SSL-3WD model, the excellent performance of 3WD in the case of insufficient information was used to meet the assumption of sufficient redundancy of data information in SSL. Firstly, the 3WD theory was used to classify network behavior data, then some appropriate "pseudo-labeled" samples were selected according to the classification results to form a new training set to expand the original dataset. Finally, the classification process was repeated to obtain all the classifications of network behavior data. On the NSL-KDD dataset, the detection rate of the proposed model was 97.7%, which was 5.8 percentage points higher than that of the adaptive integrated learning intrusion detection model Multi-Tree, which has the highest detection rate in the comparison methods. On the UNSW-NB15 dataset, the accuracy of the proposed model reached 94.7% and the detection rate reached 96.3%, which were increased by 3.5 percentage points and 6.2 percentage points respectively compared with those of the best performing one in the comparison methods, the intrusion detection model based on Stack Nonsymmetric Deep Autoencoder (SNDAE). The experimental results show that the proposed SSL-3WD model improves the accuracy and detection rate of network behavior detection.

Key words: intrusion detection, Semi-Supervised Learning (SSL), Three-Way Decision (3WD), unknown attack, sufficient redundancy

中图分类号: