基于多层网络流量分析的用户分类方法

doi:10.11772/j.issn.1001-9081.2017.03.705

计算机应用 ›› 2017, Vol. 37 ›› Issue (3): 705-710.DOI: 10.11772/j.issn.1001-9081.2017.03.705

基于多层网络流量分析的用户分类方法

穆桃, 陈伟, 陈松健

南京邮电大学计算机学院, 南京 210023

收稿日期:2016-08-01 修回日期:2016-10-19 发布日期:2017-03-22 出版日期:2017-03-10
通讯作者: 陈伟
作者简介:穆桃(1992-),女,湖南临湘人,硕士研究生,主要研究方向:无线网络安全、用户隐私保护;陈伟(1979-),男,江苏淮安人,教授,博士,CCF会员,主要研究方向:无线传感器、网络安全;陈松健(1993-),男,江苏苏州人,硕士研究生,主要研究方向:无线网络安全、用户隐私保护。
基金资助:
国家自然科学基金资助项目（61202353，61272084）。

User classification method based on multiple-layer network traffic analysis

MU Tao, CHEN Wei, CHEN Songjian

School of Computer Science & Technology, Nanjing University of Posts and Telecommunications, Nanjing Jiangsu 210023, China

Received:2016-08-01 Revised:2016-10-19 Online:2017-03-22 Published:2017-03-10
Supported by:
This work is supported by the National Natural Science Foundation of China (61202353, 61272084).

摘要/Abstract

摘要： 对用户进行准确分类对提高客户定制服务的质量具有重要作用，但用户出于隐私保护的考虑，经常不配合网络服务商，拒绝提供个人信息，如地理位置信息、兴趣爱好等。为解决这一问题，在保护用户隐私的前提下，通过分析网络层、应用层等多层网络流量，然后利用K-means聚类、随机森林算法等机器学习方法，预测出用户的地理位置类型（比如公寓、校园等）和兴趣爱好，并分析地理位置类型与用户兴趣爱好的关系，以提高对用户分类的准确性。实验结果表明，此方案可以自适应地划分用户所属用户类型和地理位置类型，通过关联用户的地理位置类型和用户类型提高了用户行为分析的准确性。

关键词: 流量分类, 地理位置, 用户偏好, K-means聚类, 随机森林

Abstract: Accurate classification of users plays an important role in improving the quality of customized services, but for privacy considerations users, often do not meet the network service providers, refusing to provide personal information, such as location information, hobbies and so on. To solve this problem, by analyzing the multi-layer network traffic such as network layer and application layer under the premise of protecting user privacy, and then using machine learning methods such as K-means clustering and random forest algorithm to predict the user's geographic location types (such as apartments, campuses, etc.) and hobbies, and the relationship between geographic location types and the user interests was analyzed to improve the accuracy of user classification. The experimental results show that the proposed scheme can adaptively partition the user types and geographic location types, and improve the accuracy of user behavior analysis by correlating the user's geographic location type and the user type.

Key words: traffic classification, geographic localization, user preference, K-means clustering, random forest

中图分类号:

TP393.08

穆桃, 陈伟, 陈松健. 基于多层网络流量分析的用户分类方法[J]. 计算机应用, 2017, 37(3): 705-710.

MU Tao, CHEN Wei, CHEN Songjian. User classification method based on multiple-layer network traffic analysis[J]. Journal of Computer Applications, 2017, 37(3): 705-710.

参考文献

[1] AHMED M, MAHMOOD A N. Network traffic analysis based on collective anomaly detection[C]//Proceedings of the 2014 IEEE 9th Conference on Industrial Electronics and Applications. Piscataway, NJ:IEEE, 2014:228-237.
[2] BEKERMAN D, SHAPIRA B, ROKACH L, et al. Unknown malware detection using network traffic classification[EB/OL].[2016-01-12]. https://www.researchgate.net/publication/304605520_Unknown_malware_detection_using_network_traffic_classification.
[3] LAI Y, CHEN Y, LIU Z, et al. On monitoring and predicting mobile network traffic abnormality[J]. Simulation Modelling Practice and Theory, 2014, 50:176-188.
[4] XIA N, MISKOVIC S, BALDI M, et al. GeoEcho:inferring user interests from geotag reports in network traffic[C]//Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. Washington, DC:IEEE Computer Society, 2014, 2:1-8.
[5] FUKUDA K, ASAI H, NAGAMI K. Tracking the evolution and diversity in network usage of smartphones[C]//Proceedings of the 2015 ACM Conference on Internet Measurement Conference. New York:ACM, 2015:253-266.
[6] TANG H, LIAO S S, SUN S X. A prediction framework based on contextual data to support mobile personalized marketing[J]. Decision Support Systems, 2013, 56(4):234-246.
[7] 蔡君,余顺争.基于复杂网络社团划分的网络流量分类[J].计算机科学,2011,38(3):80-82.(CAI J, YU S Z. Internet traffic classification based on detecting community structure in complex network[J]. Computer Science, 2011, 38(3):80-82.)
[8] AL KHATER N, OVERILL R E. Network traffic classification techniques and challenges[C]//Proceedings of the 201510th International Conference on Digital Information Management. Piscataway, NJ:IEEE, 2015:43-48.
[9] DAS A K, PATHAK P H, CHUAH C N, et al. Contextual localization through network traffic analysis[EB/OL].[2016-02-04]. http://spirit.cs.ucdavis.edu/pubs/conf/infocom14.pdf.
[10] ZHANG F, HE W, LIU X, et al. Inferring users' online activities through traffic analysis[C]//Proceedings of the 4th ACM Conference on Wireless Network Security. New York:ACM, 2011:59-70.
[11] HE H, QIAO Y, GAO S, et al. Prediction of user mobility pattern on a network traffic analysis platform[C]//Proceedings of the 10th International Workshop on Mobility in the Evolving Internet Architecture. New York:ACM, 2015:39-44.
[12] ZAMAN M, SIDDIQUI T, AMIN M R, et al. Malware detection in Android by network traffic analysis[C]//Proceedings of the 2015 International Conference on Networking Systems and Security. Piscataway, NJ:IEEE, 2015:1-5.
[13] ZHANG J, XIANG Y, WANG Y, et al. Network traffic classification using correlation information[J]. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(1):104-117.
[14] VLĂDUTU A, COMĂNECI D, DOBRE C. Internet traffic classification based on flows' statistical properties with machine learning[EB/OL].[2016-01-04]. http://xueshu.baidu.com/s?wd=paperuri%3A%28d28202f939e15174bab4e79108ffc9c4%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1002%2Fnem.1929%2Fabstract&ie=utf-8&sc_us=16572060722905074141.
[15] 刘建伟,刘媛,罗雄麟.半监督学习方法[J].计算机学报,2015,38(8):1592-1617.(LIU J W, LIU Y, LUO X L. Semi-supervised learning method[J]. Chinese Journal of Computers, 2015, 38(8):1592-1617.)
[16] BAKHSHI T, GHITA B. User traffic profiling[C]//Proceedings of the 2015 Internet Technologies and Applications. Piscataway, NJ:IEEE, 2015:91-97.
[17] ANGELOV P, KANGIN D, ZHOU X, et al. Symbol recognition with a new autonomously evolving classifier autoclass[C]//Proceedings of the 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems. Piscataway, NJ:IEEE, 2014:1-7.
[18] 徐鹏,林森.基于C4.5决策树的流量分类方法[J].软件学报,2009,20(10):2692-2704.(XU P, LIN S. Traffic classification method based on C4.5 decision tree[J]. Journal of Software, 2009, 20(10):2692-2704.)
[19] WANG Y, XIANG Y, ZHANG J. Network traffic clustering using random forest proximities[C]//Proceedings of the 2013 IEEE International Conference on Communications. Piscataway, NJ:IEEE, 2013:2058-2062.
[20] 屠金路,金瑜,王庭照.bootstrap法在合成分数信度区间估计中的应用[J].心理科学,2005,28(5):1199-1200.(TU J L, JIN Y, WANG T Z. The application of bootstrap method in the estimation of synthetic fractional reliability[J]. Psychological Science, 2005, 28(5):1199-1200.)
[21] 汪中,刘贵全,陈恩红.一种优化初始中心点的K-means算法[J].模式识别与人工智能,2009,22(2):299-304.(WANG Z, LIU G Q, CHEN E H. K-means algorithm for optimizing initial center point[J]. Pattern Recognition and Artificial Intelligence, 2009, 22(2):299-304.)

基于多层网络流量分析的用户分类方法

User classification method based on multiple-layer network traffic analysis

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	孙林, 刘梦含. 基于自适应布谷鸟优化特征选择的K-means聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 831-841.
[2]	翟冉, 陈学斌, 张国鹏, 裴浪涛, 马征. 基于不同敏感度的改进K-匿名隐私保护算法[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1497-1503.
[3]	崔剑, 麻开朗, 孙钰, 王豆, 周君良. 面向加密流量分类的深度可解释方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1151-1159.
[4]	徐精诚, 陈学斌, 董燕灵, 杨佳. 融合特征选择的随机森林DDoS攻击检测[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3497-3503.
[5]	谢康, 姜国庆, 郭杭鑫, 刘峥. 基于改进GM（1，n）的动态网络舆情预警模型[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 299-305.
[6]	李由之, 胡志华, 陈春, 杨培蓓, 董雅静. 基于双长短期记忆网络组合的网络货运平台成交定价预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1616-1623.
[7]	潘仁志, 钱付兰, 赵姝, 张燕平. 基于卷积神经网络交互的用户属性偏好建模的推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 404-411.
[8]	赵乐, 张恩, 秦磊勇, 李功丽. 基于区块链的多方隐私保护k-means聚类方案[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3801-3812.
[9]	王亚丽, 陈家超, 张俊娜. 移动边缘计算中收益最大化的缓存协作策略[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3479-3485.
[10]	彭鹏, 倪志伟, 朱旭辉. 基于用户满意效用的空间众包任务分配方法[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3235-3243.
[11]	张杨, 董士程. 面向并发程序中锁机制的智能化推荐方法[J]. 计算机应用, 2021, 41(6): 1597-1603.
[12]	郭帅, 苏旸. 基于数据流的加密流量分类方法[J]. 计算机应用, 2021, 41(5): 1386-1391.
[13]	余东昌, 赵文芳, 聂凯, 张舸. 基于LightGBM算法的能见度预测模型[J]. 《计算机应用》唯一官方网站, 2021, 41(4): 1035-1041.
[14]	张恩, 李会敏, 常键. 可验证的隐私保护k-means聚类方案[J]. 计算机应用, 2021, 41(2): 413-421.
[15]	杨威亚, 余正涛, 高盛祥, 宋燃. 基于跨语言神经主题模型的汉越新闻话题发现方法[J]. 计算机应用, 2021, 41(10): 2879-2884.