支持向量机的半监督网络流量分类方法

doi:10.3724/SP.J.1087.2013.01515

计算机应用 ›› 2013, Vol. 33 ›› Issue (06): 1515-1518.DOI: 10.3724/SP.J.1087.2013.01515

支持向量机的半监督网络流量分类方法

李平红¹,王勇²,陶晓玲³

1. 桂林电子科技大学计算机科学与工程学院，广西桂林 541004
2. 桂林电子科技大学广西可信软件重点实验室，广西桂林 541004
3. 桂林电子科技大学信息与通信学院，广西桂林 541004

收稿日期:2012-12-12 修回日期:2013-02-20 发布日期:2013-06-05 出版日期:2013-06-01
通讯作者: 王勇
作者简介:李平红（1984-），女，重庆人，硕士研究生，主要研究方向：网络安全；王勇(1964－)，男，四川阆中人，教授，博士，主要研究方向：计算机网络、信息安全；陶晓玲名(1977-)，女，浙江金华人，副研究员，硕士，主要研究方向：计算机网络。
基金资助:
国家自然科学基金资助项目(61163058,61172053）;广西自然科学基金资助项目(2011GXNSFB018076)

A Semi-supervised Network Traffic Classification Method Based on Support Vector Machine

LI Pinghong¹,WANG Yong²,TAO Xiaoling³

1. College of Computer Science and Engineering, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
2. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
3. College of Information and Communication, Guilin University of Electronic Technology, Guilin Guangxi 541004, China

Received:2012-12-12 Revised:2013-02-20 Online:2013-06-05 Published:2013-06-01
Contact: WANG Yong

摘要/Abstract

摘要： 针对传统网络流量分类方法准确率低、开销大、应用范围受限等问题，提出一种支持向量机（SVM）的半监督网络流量分类方法。该方法在SVM训练中，使用增量学习技术在初始和新增样本集中动态地确定支持向量，避免不必要的重复训练，改善因出现新样本而造成原分类器分类精度降低、分类时间长的情况；改进半监督Tri-training方法对分类器进行协同训练，同时使用大量未标记和少量已标记样本对分类器进行反复修正, 减少辅助分类器的噪声数据，克服传统协同验证对分类算法及样本类型要求苛刻的不足。实验结果表明，该方法可明显提高网络流量分类的准确率和效率。

关键词: 网络流量分类, 支持向量机, 半监督, 增量学习, 协同训练

Abstract: In order to solve low accuracy, large time consumption and limited application range in traditional network traffic classification, a semisupervised network traffic classification method of Support Vector Machine (SVM) was proposed. During the training of SVM, it determined the support vectors from the initial and new sample set by using incremental learning technology, avoided unnecessary repetition training, and improved the situation of original classifiers’ low accuracy and timeconsuming as a result of new samples that appeared. This paper also proposed an improved Tri-training method to train multiple classifiers, and a large number of unlabeled samples and a small amount of labeled samples were used to modify the classifiers, which reduced auxiliary classifier’s noise data and overcame the strict limitation of sample types and traditional Coverification for classification methods. The experimental results show that the proposed algorithm has excellent accuracy and speed in traffic classification.

Key words: network traffic classification, Support Vector Machine (SVM), semisupervised, incremental learning, Tri-training

中图分类号:

TP393.07

李平红王勇陶晓玲. 支持向量机的半监督网络流量分类方法[J]. 计算机应用, 2013, 33(06): 1515-1518.

LI Pinghong WANG Yong TAO Xiaoling. A Semi-supervised Network Traffic Classification Method Based on Support Vector Machine[J]. Journal of Computer Applications, 2013, 33(06): 1515-1518.

参考文献

[1]LI W, CANIN M, MOORE A W. Efficient application identification and the temporal and spatial stability of classification schema［J］.Computer Networks, 2009, 53(6):790-809.

[2]张宾,杨家海,吴建平. Internet 流量模型分析与评述［J］.软件学报,2011, 22(1):115-131.〖LL〗〖HJ1.45mm〗

[3]MOORE A W, PAPAGIANNAKI K. Toward the accurate identification of network application［C］// Proceedings of the 6th International Workshop on Passive and Active Network Measurement. Berlin: Springer-Verlag, 2005: 41-54.

[4]THUY T T, ARMITAGE G. A survey of techniques for Internet traffic classification using machine learning［J］.IEEE Communications Surveys and Tutorials,2008,10(4):56-76.

[5]李响.基于半监督支持向量机的网络流量分类机制的研究与实现［D］.北京：北京邮电大学，2011.

[6]徐鹏，林森.基于C4.5决策树的流量分类方法［J］.软件学报，2009，20（10）：2692-2704.

[7]贺玲，蔡益朝，杨征.高维数据聚类方法综述［J］.计算机应用研究,2010，27(1)：23-26.

[8]周伟达，张莉，焦李成．支撑矢量机推广能力分析［J］．电子学报，2001,29(5)：590-594．

[9]王晓丹，郑春颖，吴崇明．一种新的SVM对等增量学习算法［J］.计算机应用，2006，26(10) : 2440-2443．

[10]萧嵘,王继承，孙正兴，等.一种SVM增量学习算法α-ISVM［J］.软件学报，2001,12（12）：1818-1823.

[11]曾文华,马健.一种新的支持向量机增量学习算法［J］.厦门大学学报:自然科学版,2002,41(6):687-691.

[12]GOLDMAN S, ZHOU Y. Enhancing supervised learning with unlabeled data［C］// Proceedings of the 17th International Conference on Machine Learning. San Francisco: Morgan Kaufmann,2000:327-334.

[13]ZHOU Z H, LI M.Tri-training: Exploiting unlabeled data using three classifiers ［J］.IEEE Transactions on Knowledge and Data Engineering, 2005, 17(11): 1529 -1541.

[14]MOORE A W, ZUEY D. Internet traffic classification using Bayesian analysis techniques［C］// Proceedings of the 2005 International Conference on Measurement and Modeling of Computer Systems. New York:ACM, 2005:50-60.

[15]徐鹏，刘琼，林森.基于支持向量机的Internet流量分类研究计算机网络［J］.计算机研究与发展,2009， 46（3）：407-414.

支持向量机的半监督网络流量分类方法

A Semi-supervised Network Traffic Classification Method Based on Support Vector Machine

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	张英俊, 李牛牛, 谢斌红, 张睿, 陆望东. 课程学习指导下的半监督目标检测框架[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2326-2333.
[2]	李晨倩, 刘俊. 基于半监督和多尺度级联注意力的超声颈动脉斑块分割方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2604-2610.
[3]	周妍, 李阳. 用于脑卒中病灶分割的具有注意力机制的校正交叉伪监督方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1942-1948.
[4]	孙敏, 成倩, 丁希宁. 基于CBAM-CGRU-SVM的Android恶意软件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1539-1545.
[5]	李雪, 姚光乐, 王洪辉, 李军, 周皓然, 叶绍泽. 基于样本增量学习的遥感影像分类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 732-736.
[6]	丁建立, 黄辉, 曹卫东. 航班链运行状态动态监控方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3941-3948.
[7]	张帅华, 张淑芬, 周明川, 徐超, 陈学斌. 基于半监督联邦学习的恶意流量检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3487-3494.
[8]	乔恩保, 高向阳, 程俊. 基于支持向量机的自恢复自适应蒙特卡洛定位算法[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3246-3251.
[9]	高肇泽, 朱小飞, 项能强. 基于类别感知课程学习的半监督立场检测[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3281-3287.
[10]	王瑞琪, 纪淑娟, 曹宁, 郭亚杰. 基于一致性训练的半监督虚假招聘广告检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2932-2939.
[11]	黄学雨, 贺怀宇, 林慧敏, 陈金水. 基于特征聚合的铜合金金相图分类识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2593-2601.
[12]	姜春茂, 吴鹏, 李志聪. 基于Seeds集和成对约束的半监督三支聚类集成[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1481-1488.
[13]	伏博毅, 彭云聪, 蓝鑫, 秦小林. 基于深度学习的标签噪声学习算法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 674-684.
[14]	温祥西, 彭娅婷, 毕可心, 衡宇铭, 吴明功. 基于最优样本集在线模糊最小二乘支持向量机的飞行冲突网络态势预测[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3632-3640.
[15]	杨晓菡, 郝国生, 张谢华, 杨子豪. 基于协同训练与Boosting的协同过滤算法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3136-3141.