基于多层网络流量分析的用户分类方法

doi:10.11772/j.issn.1001-9081.2017.03.705

计算机应用 ›› 2017, Vol. 37 ›› Issue (3): 705-710.DOI: 10.11772/j.issn.1001-9081.2017.03.705

基于多层网络流量分析的用户分类方法

穆桃, 陈伟, 陈松健

南京邮电大学计算机学院, 南京 210023

收稿日期:2016-08-01 修回日期:2016-10-19 出版日期:2017-03-10 发布日期:2017-03-22
通讯作者: 陈伟
作者简介:穆桃(1992-),女,湖南临湘人,硕士研究生,主要研究方向:无线网络安全、用户隐私保护;陈伟(1979-),男,江苏淮安人,教授,博士,CCF会员,主要研究方向:无线传感器、网络安全;陈松健(1993-),男,江苏苏州人,硕士研究生,主要研究方向:无线网络安全、用户隐私保护。
基金资助:
国家自然科学基金资助项目（61202353，61272084）。

User classification method based on multiple-layer network traffic analysis

MU Tao, CHEN Wei, CHEN Songjian

School of Computer Science & Technology, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210023, China

Received:2016-08-01 Revised:2016-10-19 Online:2017-03-10 Published:2017-03-22
Supported by:
This work is supported by the National Natural Science Foundation of China (61202353, 61272084).

摘要/Abstract

摘要： 对用户进行准确分类对提高客户定制服务的质量具有重要作用，但用户出于隐私保护的考虑，经常不配合网络服务商，拒绝提供个人信息，如地理位置信息、兴趣爱好等。为解决这一问题，在保护用户隐私的前提下，通过分析网络层、应用层等多层网络流量，然后利用K-means聚类、随机森林算法等机器学习方法，预测出用户的地理位置类型（比如公寓、校园等）和兴趣爱好，并分析地理位置类型与用户兴趣爱好的关系，以提高对用户分类的准确性。实验结果表明，此方案可以自适应地划分用户所属用户类型和地理位置类型，通过关联用户的地理位置类型和用户类型提高了用户行为分析的准确性。

关键词: 流量分类, 地理位置, 用户偏好, K-means聚类, 随机森林

Abstract: Accurate classification of users plays an important role in improving the quality of customized services, but for privacy considerations users, often do not meet the network service providers, refusing to provide personal information, such as location information, hobbies and so on. To solve this problem, by analyzing the multi-layer network traffic such as network layer and application layer under the premise of protecting user privacy, and then using machine learning methods such as K-means clustering and random forest algorithm to predict the user's geographic location types (such as apartments, campuses, etc.) and hobbies, and the relationship between geographic location types and the user interests was analyzed to improve the accuracy of user classification. The experimental results show that the proposed scheme can adaptively partition the user types and geographic location types, and improve the accuracy of user behavior analysis by correlating the user's geographic location type and the user type.

Key words: traffic classification, geographic localization, user preference, K-means clustering, random forest

中图分类号:

TP393.08

穆桃, 陈伟, 陈松健. 基于多层网络流量分析的用户分类方法[J]. 计算机应用, 2017, 37(3): 705-710.

MU Tao, CHEN Wei, CHEN Songjian. User classification method based on multiple-layer network traffic analysis[J]. Journal of Computer Applications, 2017, 37(3): 705-710.

参考文献

[1] AHMED M, MAHMOOD A N. Network traffic analysis based on collective anomaly detection[C]//Proceedings of the 2014 IEEE 9th Conference on Industrial Electronics and Applications. Piscataway, NJ:IEEE, 2014:228-237.
[2] BEKERMAN D, SHAPIRA B, ROKACH L, et al. Unknown malware detection using network traffic classification[EB/OL].[2016-01-12]. https://www.researchgate.net/publication/304605520_Unknown_malware_detection_using_network_traffic_classification.
[3] LAI Y, CHEN Y, LIU Z, et al. On monitoring and predicting mobile network traffic abnormality[J]. Simulation Modelling Practice and Theory, 2014, 50:176-188.
[4] XIA N, MISKOVIC S, BALDI M, et al. GeoEcho:inferring user interests from geotag reports in network traffic[C]//Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. Washington, DC:IEEE Computer Society, 2014, 2:1-8.
[5] FUKUDA K, ASAI H, NAGAMI K. Tracking the evolution and diversity in network usage of smartphones[C]//Proceedings of the 2015 ACM Conference on Internet Measurement Conference. New York:ACM, 2015:253-266.
[6] TANG H, LIAO S S, SUN S X. A prediction framework based on contextual data to support mobile personalized marketing[J]. Decision Support Systems, 2013, 56(4):234-246.
[7] 蔡君,余顺争.基于复杂网络社团划分的网络流量分类[J].计算机科学,2011,38(3):80-82.(CAI J, YU S Z. Internet traffic classification based on detecting community structure in complex network[J]. Computer Science, 2011, 38(3):80-82.)
[8] AL KHATER N, OVERILL R E. Network traffic classification techniques and challenges[C]//Proceedings of the 201510th International Conference on Digital Information Management. Piscataway, NJ:IEEE, 2015:43-48.
[9] DAS A K, PATHAK P H, CHUAH C N, et al. Contextual localization through network traffic analysis[EB/OL].[2016-02-04]. http://spirit.cs.ucdavis.edu/pubs/conf/infocom14.pdf.
[10] ZHANG F, HE W, LIU X, et al. Inferring users' online activities through traffic analysis[C]//Proceedings of the 4th ACM Conference on Wireless Network Security. New York:ACM, 2011:59-70.
[11] HE H, QIAO Y, GAO S, et al. Prediction of user mobility pattern on a network traffic analysis platform[C]//Proceedings of the 10th International Workshop on Mobility in the Evolving Internet Architecture. New York:ACM, 2015:39-44.
[12] ZAMAN M, SIDDIQUI T, AMIN M R, et al. Malware detection in Android by network traffic analysis[C]//Proceedings of the 2015 International Conference on Networking Systems and Security. Piscataway, NJ:IEEE, 2015:1-5.
[13] ZHANG J, XIANG Y, WANG Y, et al. Network traffic classification using correlation information[J]. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(1):104-117.
[14] VLĂDUTU A, COMĂNECI D, DOBRE C. Internet traffic classification based on flows' statistical properties with machine learning[EB/OL].[2016-01-04]. http://xueshu.baidu.com/s?wd=paperuri%3A%28d28202f939e15174bab4e79108ffc9c4%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1002%2Fnem.1929%2Fabstract&ie=utf-8&sc_us=16572060722905074141.
[15] 刘建伟,刘媛,罗雄麟.半监督学习方法[J].计算机学报,2015,38(8):1592-1617.(LIU J W, LIU Y, LUO X L. Semi-supervised learning method[J]. Chinese Journal of Computers, 2015, 38(8):1592-1617.)
[16] BAKHSHI T, GHITA B. User traffic profiling[C]//Proceedings of the 2015 Internet Technologies and Applications. Piscataway, NJ:IEEE, 2015:91-97.
[17] ANGELOV P, KANGIN D, ZHOU X, et al. Symbol recognition with a new autonomously evolving classifier autoclass[C]//Proceedings of the 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems. Piscataway, NJ:IEEE, 2014:1-7.
[18] 徐鹏,林森.基于C4.5决策树的流量分类方法[J].软件学报,2009,20(10):2692-2704.(XU P, LIN S. Traffic classification method based on C4.5 decision tree[J]. Journal of Software, 2009, 20(10):2692-2704.)
[19] WANG Y, XIANG Y, ZHANG J. Network traffic clustering using random forest proximities[C]//Proceedings of the 2013 IEEE International Conference on Communications. Piscataway, NJ:IEEE, 2013:2058-2062.
[20] 屠金路,金瑜,王庭照.bootstrap法在合成分数信度区间估计中的应用[J].心理科学,2005,28(5):1199-1200.(TU J L, JIN Y, WANG T Z. The application of bootstrap method in the estimation of synthetic fractional reliability[J]. Psychological Science, 2005, 28(5):1199-1200.)
[21] 汪中,刘贵全,陈恩红.一种优化初始中心点的K-means算法[J].模式识别与人工智能,2009,22(2):299-304.(WANG Z, LIU G Q, CHEN E H. K-means algorithm for optimizing initial center point[J]. Pattern Recognition and Artificial Intelligence, 2009, 22(2):299-304.)

基于多层网络流量分析的用户分类方法

User classification method based on multiple-layer network traffic analysis

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	张杨, 董士程. 面向并发程序中锁机制的智能化推荐方法[J]. 计算机应用, 2021, 41(6): 1597-1603.
[2]	郭帅, 苏旸. 基于数据流的加密流量分类方法[J]. 计算机应用, 2021, 41(5): 1386-1391.
[3]	余东昌, 赵文芳, 聂凯, 张舸. 基于LightGBM算法的能见度预测模型[J]. 计算机应用, 2021, 41(4): 1035-1041.
[4]	张恩, 李会敏, 常键. 可验证的隐私保护k-means聚类方案[J]. 计算机应用, 2021, 41(2): 413-421.
[5]	杨威亚, 余正涛, 高盛祥, 宋燃. 基于跨语言神经主题模型的汉越新闻话题发现方法[J]. 计算机应用, 2021, 41(10): 2879-2884.
[6]	张增辉, 姜高霞, 王文剑. 基于局部概率抽样的标签噪声过滤方法[J]. 计算机应用, 2021, 41(1): 67-73.
[7]	周翔, 翟俊海, 黄雅婕, 申瑞彩, 侯璎真. 基于随机森林和投票机制的大数据样例选择算法[J]. 计算机应用, 2021, 41(1): 74-80.
[8]	刘丹, 姚立霜, 王云锋, 裴作飞. 面向类不平衡流量数据的分类模型[J]. 计算机应用, 2020, 40(8): 2327-2333.
[9]	肖跃雷, 张云娇. 基于特征选择和超参数优化的恐怖袭击组织预测方法[J]. 计算机应用, 2020, 40(8): 2262-2267.
[10]	聂茜婵, 张阳, 余敦辉, 张兴盛. 面向全局优化的时空众包任务分配算法[J]. 计算机应用, 2020, 40(7): 1950-1958.
[11]	王磊. 改进粗糙集属性约简结合K-means聚类的网络入侵检测方法[J]. 计算机应用, 2020, 40(7): 1996-2002.
[12]	沈亮, 王鑫, 陈曙晖. 面向移动应用识别的结构化特征提取方法[J]. 计算机应用, 2020, 40(4): 1109-1114.
[13]	余敦辉, 袁旭, 张万山, 王晨旭. 基于动态阈值的时空众包在线分配算法[J]. 计算机应用, 2020, 40(3): 658-664.
[14]	王治忠, 钱龙龙, 韩闯, 师丽. 基于统计特征和熵特征融合的心肌梗死辅助诊断方法[J]. 计算机应用, 2020, 40(2): 608-615.
[15]	陈禹, 毛莺池. 基于随机森林和遗传算法的Ceph参数自动调优[J]. 计算机应用, 2020, 40(2): 347-351.