计算机应用 ›› 2014, Vol. 34 ›› Issue (11): 3206-3209.DOI: 10.11772/j.issn.1001-9081.2014.11.3206

• 网络与通信 • 上一篇    下一篇

基于类标记扩展的半监督网络流量特征选择算法

林荣强1,李青1,李鸥2,李林林1   

  1. 1. 解放军信息工程大学
    2. 解放军信息工程大学信息工程学院
  • 收稿日期:2014-06-10 修回日期:2014-07-26 发布日期:2014-12-01 出版日期:2014-11-01
  • 通讯作者: 林荣强
  • 作者简介:林荣强(1990-),男,福建漳州人,硕士研究生,主要研究方向:计算机通信、网络安全;李鸥(1962-),男,河南郑州人,教授,博士, 主要研究方向:无线认知网络、传感器自组织网络、通信网络安全;李青(1976-),女,河北正定人,副教授,博士,主要研究方向:通信网络安全、可见光通信;李林林(1989-),男,河北定州人,硕士研究生,主要研究方向:计算机通信与网络安全。
  • 基金资助:

    国家安全重大基础研究项目

Semi-supervised network traffic feature selection algorithm based on label extension

LIN Rongqiang,LI Qing,LI Ou,LI Linlin   

  1. Information System Engineering Institute, Information Engineering University, Zhengzhou Henan 450001, China
  • Received:2014-06-10 Revised:2014-07-26 Online:2014-12-01 Published:2014-11-01
  • Contact: LIN Rongqiang

摘要:

针对网络流量特征选择过程中存在的样本标记瓶颈问题,以及现有半监督方法无法选择强相关的特征的不足,提出一种基于类标记扩展的多类半监督特征选择(SFSEL)算法。该算法首先从少量的标记样本出发,通过K-means算法对未标记样本进行类标记扩展;然后结合基于双重正则的支持向量机(MDrSVM)算法实现多类数据的特征选择。与半监督特征选择算法Spectral、PCFRSC和SEFR在Moore数据集进行了对比实验,SFSEL得到的分类准确率和召回率明显都要高于其他算法,而且SFSEL算法选择的特征个数明显少于其他算法。实验结果表明: SFSEL算法能够有效地提高所选特征的相关性,获取更好的网络流量分类性能。

Abstract:

Aiming at the problem of sample labeling in network traffic feature selection, and the deficiency of traditional semi-supervised methods which can not select a strong correlation feature set, a Semi-supervised Feature Selection based on Extension of Label (SFSEL) algorithm was proposed. The model started from a small number of labeled samples, and the labels of unlabeled samples were extended by K-means algorithm, then MDrSVM (Multi-class Doubly regularized Support Vector Machine) algorithm was combined to achieve feature selection of multi-class network data. Comparison experiments with other semi-supervised algorithms including Spectral, PCFRSC and SEFR on Moore network data set were given, where SFSEL got higher classification accuracy and recall with fewer selection features. The experimental results show that the proposed algorithm has a better classification performance with selecting a strong correlation feature set of network traffic.

中图分类号: