Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (3): 705-710.DOI: 10.11772/j.issn.1001-9081.2017.03.705

User classification method based on multiple-layer network traffic analysis

MU Tao, CHEN Wei, CHEN Songjian   

  1. School of Computer Science & Technology, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210023, China
  • Received:2016-08-01 Revised:2016-10-19 Online:2017-03-22 Published:2017-03-10
  • Supported by:
    This work is supported by the National Natural Science Foundation of China (61202353, 61272084).


穆桃, 陈伟, 陈松健   

  1. 南京邮电大学 计算机学院, 南京 210023
  • 通讯作者: 陈伟
  • 作者简介:穆桃(1992-),女,湖南临湘人,硕士研究生,主要研究方向:无线网络安全、用户隐私保护;陈伟(1979-),男,江苏淮安人,教授,博士,CCF会员,主要研究方向:无线传感器、网络安全;陈松健(1993-),男,江苏苏州人,硕士研究生,主要研究方向:无线网络安全、用户隐私保护。
  • 基金资助:

Abstract: Accurate classification of users plays an important role in improving the quality of customized services, but for privacy considerations users, often do not meet the network service providers, refusing to provide personal information, such as location information, hobbies and so on. To solve this problem, by analyzing the multi-layer network traffic such as network layer and application layer under the premise of protecting user privacy, and then using machine learning methods such as K-means clustering and random forest algorithm to predict the user's geographic location types (such as apartments, campuses, etc.) and hobbies, and the relationship between geographic location types and the user interests was analyzed to improve the accuracy of user classification. The experimental results show that the proposed scheme can adaptively partition the user types and geographic location types, and improve the accuracy of user behavior analysis by correlating the user's geographic location type and the user type.

Key words: traffic classification, geographic localization, user preference, K-means clustering, random forest

摘要: 对用户进行准确分类对提高客户定制服务的质量具有重要作用,但用户出于隐私保护的考虑,经常不配合网络服务商,拒绝提供个人信息,如地理位置信息、兴趣爱好等。为解决这一问题,在保护用户隐私的前提下,通过分析网络层、应用层等多层网络流量,然后利用K-means聚类、随机森林算法等机器学习方法,预测出用户的地理位置类型(比如公寓、校园等)和兴趣爱好,并分析地理位置类型与用户兴趣爱好的关系,以提高对用户分类的准确性。实验结果表明,此方案可以自适应地划分用户所属用户类型和地理位置类型,通过关联用户的地理位置类型和用户类型提高了用户行为分析的准确性。

关键词: 流量分类, 地理位置, 用户偏好, K-means聚类, 随机森林

