计算机应用 ›› 2013, Vol. 33 ›› Issue (06): 1515-1518.DOI: 10.3724/SP.J.1087.2013.01515

• 网络与通信 • 上一篇    下一篇

支持向量机的半监督网络流量分类方法

李平红1,王勇2,陶晓玲3   

  1. 1. 桂林电子科技大学 计算机科学与工程学院,广西 桂林 541004
    2. 桂林电子科技大学 广西可信软件重点实验室,广西 桂林 541004
    3. 桂林电子科技大学 信息与通信学院,广西 桂林 541004
  • 收稿日期:2012-12-12 修回日期:2013-02-20 出版日期:2013-06-01 发布日期:2013-06-05
  • 通讯作者: 王勇
  • 作者简介:李平红(1984-),女,重庆人,硕士研究生,主要研究方向:网络安全;王勇(1964-),男,四川阆中人,教授,博士,主要研究方向:计算机网络、信息安全;陶晓玲名(1977-),女,浙江金华人,副研究员,硕士,主要研究方向:计算机网络。
  • 基金资助:

    国家自然科学基金资助项目(61163058,61172053);广西自然科学基金资助项目(2011GXNSFB018076)

A Semi-supervised Network Traffic Classification Method Based on Support Vector Machine

LI Pinghong1,WANG Yong2,TAO Xiaoling3   

  1. 1. College of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
    2. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
    3. College of Information and Communication, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Received:2012-12-12 Revised:2013-02-20 Online:2013-06-05 Published:2013-06-01
  • Contact: WANG Yong

摘要: 针对传统网络流量分类方法准确率低、开销大、应用范围受限等问题,提出一种支持向量机(SVM)的半监督网络流量分类方法。该方法在SVM训练中,使用增量学习技术在初始和新增样本集中动态地确定支持向量,避免不必要的重复训练,改善因出现新样本而造成原分类器分类精度降低、分类时间长的情况;改进半监督Tri-training方法对分类器进行协同训练,同时使用大量未标记和少量已标记样本对分类器进行反复修正, 减少辅助分类器的噪声数据,克服传统协同验证对分类算法及样本类型要求苛刻的不足。实验结果表明,该方法可明显提高网络流量分类的准确率和效率。

关键词: 网络流量分类, 支持向量机, 半监督, 增量学习, 协同训练

Abstract: In order to solve low accuracy, large time consumption and limited application range in traditional network traffic classification, a semisupervised network traffic classification method of Support Vector Machine (SVM) was proposed. During the training of SVM, it determined the support vectors from the initial and new sample set by using incremental learning technology, avoided unnecessary repetition training, and improved the situation of original classifiers’ low accuracy and timeconsuming as a result of new samples that appeared. This paper also proposed an improved Tri-training method to train multiple classifiers, and a large number of unlabeled samples and a small amount of labeled samples were used to modify the classifiers, which reduced auxiliary classifier’s noise data and overcame the strict limitation of sample types and traditional Coverification for classification methods. The experimental results show that the proposed algorithm has excellent accuracy and speed in traffic classification.

Key words: network traffic classification, Support Vector Machine (SVM), semisupervised, incremental learning, Tri-training

中图分类号: