计算机应用 ›› 2013, Vol. 33 ›› Issue (10): 2734-2738.

• 网络与通信 • 上一篇    下一篇

基于K均值和双支持向量机的P2P流量识别方法

郭伟1,王西闯1,肖振久2   

  1. 1. 辽宁工程技术大学 软件学院,辽宁 葫芦岛 125105
    2. 中国传媒大学 计算机学院,北京 100024
  • 收稿日期:2013-04-19 修回日期:2013-06-17 出版日期:2013-10-01 发布日期:2013-11-01
  • 通讯作者: 王西闯
  • 作者简介:郭伟(1970-),女,辽宁阜新人,副教授,主要研究方向:P2P流量控制;王西闯(1988-),男,河南许昌人,硕士研究生,主要研究方向:P2P流量识别;肖振久(1968-),男,辽宁阜新人,副教授,主要研究方向:信息安全。
  • 基金资助:
    国家自然科学基金资助项目;北京市自然科学基金资助项目

P2P traffic identification method based on K-means and twin support vector machine

GUO Wei1,WANG Xichuang1,XIAO Zhenjiu2   

  1. 1. College of Software, Liaoning Technical University, Huludao Liaoning 125105,China;
    2. School of Computer, Communication University of China, Beijing 100024, China
  • Received:2013-04-19 Revised:2013-06-17 Online:2013-11-01 Published:2013-10-01
  • Contact: WANG Xichuang

摘要: 针对目前常用于P2P流量识别的有监督机器学习方法普遍存在时间代价较高的现状,提出采用时间代价为标准支持向量机四分之一的双支持向量机来构建分类器,并采用K均值集成方法快速生成有标签样本集,组合有标签样本集构成双支持向量机的训练样本,最后利用构建好的双支持向量机分类模型进行P2P流量的识别。实验结果表明采用基于K均值集成结合双支持向量机的方法在P2P流量识别的时间代价、准确率和稳定性方面要远优于标准支持向量机。

关键词: P2P流量识别, 有监督机器学习, 双支持向量机, K均值集成, 时间代价

Abstract: Most of the P2P traffic identification methods have the problem of high time cost. Therefore, it was proposed to use TWin Support Vector Machine (TWSVM) whose time cost was a quarter of the common Support Vector Machine (SVM) to build classifier. Kmeans ensemble was used to create labeled sample set and labeled sample set was combined as the training sample of the TWSVM. At last, the constructed classification model was used to identify P2P traffic. The experimental results show that the method based on Kmeans and TWSVM can significantly decrease time cost of the P2P traffic identification, and has a higher accuracy rate and better stability than the standard SVM.

Key words: P2P traffic identification, supervised machine learning, TWin Support Vector Machine (TWSVM), K-means ensemble, time cost

中图分类号: