计算机应用 ›› 2013, Vol. 33 ›› Issue (01): 80-82.DOI: 10.3724/SP.J.1087.2013.00080

• 网络与通信 • 上一篇    下一篇

基于互信息选择聚类集成的网络流量分类方法

丁要军1,2,蔡皖东1   

  1. 1. 西北工业大学 计算机学院, 西安 710129
    2. 咸阳师范学院 信息工程学院, 陕西 咸阳 712000
  • 收稿日期:2012-08-01 修回日期:2012-08-28 出版日期:2013-01-01 发布日期:2013-01-09
  • 通讯作者: 丁要军
  • 作者简介:丁要军(1980-),男,河南许昌人,讲师,博士研究生,主要研究方向:网络与信息安全;蔡皖东(1955-),男,陕西西安人,教授,博士生导师,主要研究方向:网络安全、信息对抗。
  • 基金资助:

    国家863计划项目(2009AA01Z424);陕西省教育厅专项(12JK0933)

Internet traffic classification method based on selective clustering ensemble of mutual information

DING Yaojun1,2,CAI Wandong1   

  1. 1. School of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
    2. School of Information Engineering, Xianyang Normal University, Xianyang Shaanxi 712000, China
  • Received:2012-08-01 Revised:2012-08-28 Online:2013-01-01 Published:2013-01-09
  • Contact: DING Yaojun

摘要: 针对互联网流量标注困难以及单个聚类器的泛化能力较弱,提出一种基于互信息(MI)理论的选择聚类集成方法,以提高流量分类的精度。首先计算不同初始簇个数K的K均值聚类结果与训练集中流量协议的真实分布之间的规范化互信息(NMI);然后基于NMI的值来选择用于聚类集成的K均值基聚类器的K值序列;最后采用二次互信息(QMI)的一致函数生成一致聚类结果,并使用一种半监督方法对聚类簇进行标注。通过实验比较了聚类集成方法与单个聚类算法在4个不同测试集上总体分类精度。实验结果表明,聚类集成方法的流量分类总体精度能达到90%。所提方法将聚类集成模型应用到网络流量分类中,提高了流量分类的精度和在不同数据集上的分类稳定性。

关键词: 聚类集成, K均值, 流量分类, 互信息

Abstract: Because it is difficult to label Internet traffic and the generalization ability of single clustering algorithm is weak, a selective clustering ensemble method based on Mutual Information (MI) was proposed to improve the accuracy of traffic classification. In the method, the Normalized Mutual Information (NMI) between clustering results of K-means algorithm with different initial cluster number and the distribution of protocol labels of training set was computed first, and then a serial of K which were the initial cluster number of K-means algorithm based on NMI were selected. Finally, the consensus function based on Quadratic Mutual Information (QMI) was used to build the consensus partition, and the labels of clusters were labeled based on a semi-supervised method. The overall accuracies of clustering ensemble method and single clustering algorithm were compared over four testing sets, and the experimental results show that the overall accuracy of clustering ensemble method can achieve 90%. In the proposed method, a clustering ensemble model was used to classify Internet traffic, and the overall accuracy of traffic classification along with the stability of classification over different dataset got enhanced.

Key words: clustering ensemble, K-means, traffic classification, Mutual Information (MI)

中图分类号: