Abstract:Accurate classification of users plays an important role in improving the quality of customized services, but for privacy considerations users, often do not meet the network service providers, refusing to provide personal information, such as location information, hobbies and so on. To solve this problem, by analyzing the multi-layer network traffic such as network layer and application layer under the premise of protecting user privacy, and then using machine learning methods such as K-means clustering and random forest algorithm to predict the user's geographic location types (such as apartments, campuses, etc.) and hobbies, and the relationship between geographic location types and the user interests was analyzed to improve the accuracy of user classification. The experimental results show that the proposed scheme can adaptively partition the user types and geographic location types, and improve the accuracy of user behavior analysis by correlating the user's geographic location type and the user type.
[1] AHMED M, MAHMOOD A N. Network traffic analysis based on collective anomaly detection[C]//Proceedings of the 2014 IEEE 9th Conference on Industrial Electronics and Applications. Piscataway, NJ:IEEE, 2014:228-237. [2] BEKERMAN D, SHAPIRA B, ROKACH L, et al. Unknown malware detection using network traffic classification[EB/OL].[2016-01-12]. https://www.researchgate.net/publication/304605520_Unknown_malware_detection_using_network_traffic_classification. [3] LAI Y, CHEN Y, LIU Z, et al. On monitoring and predicting mobile network traffic abnormality[J]. Simulation Modelling Practice and Theory, 2014, 50:176-188. [4] XIA N, MISKOVIC S, BALDI M, et al. GeoEcho:inferring user interests from geotag reports in network traffic[C]//Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. Washington, DC:IEEE Computer Society, 2014, 2:1-8. [5] FUKUDA K, ASAI H, NAGAMI K. Tracking the evolution and diversity in network usage of smartphones[C]//Proceedings of the 2015 ACM Conference on Internet Measurement Conference. New York:ACM, 2015:253-266. [6] TANG H, LIAO S S, SUN S X. A prediction framework based on contextual data to support mobile personalized marketing[J]. Decision Support Systems, 2013, 56(4):234-246. [7] 蔡君,余顺争.基于复杂网络社团划分的网络流量分类[J].计算机科学,2011,38(3):80-82.(CAI J, YU S Z. Internet traffic classification based on detecting community structure in complex network[J]. Computer Science, 2011, 38(3):80-82.) [8] AL KHATER N, OVERILL R E. Network traffic classification techniques and challenges[C]//Proceedings of the 201510th International Conference on Digital Information Management. Piscataway, NJ:IEEE, 2015:43-48. [9] DAS A K, PATHAK P H, CHUAH C N, et al. Contextual localization through network traffic analysis[EB/OL].[2016-02-04]. http://spirit.cs.ucdavis.edu/pubs/conf/infocom14.pdf. [10] ZHANG F, HE W, LIU X, et al. Inferring users' online activities through traffic analysis[C]//Proceedings of the 4th ACM Conference on Wireless Network Security. New York:ACM, 2011:59-70. [11] HE H, QIAO Y, GAO S, et al. Prediction of user mobility pattern on a network traffic analysis platform[C]//Proceedings of the 10th International Workshop on Mobility in the Evolving Internet Architecture. New York:ACM, 2015:39-44. [12] ZAMAN M, SIDDIQUI T, AMIN M R, et al. Malware detection in Android by network traffic analysis[C]//Proceedings of the 2015 International Conference on Networking Systems and Security. Piscataway, NJ:IEEE, 2015:1-5. [13] ZHANG J, XIANG Y, WANG Y, et al. Network traffic classification using correlation information[J]. IEEE Transactions on Parallel and Distributed Systems, 2013, 24(1):104-117. [14] VLĂDUTU A, COMĂNECI D, DOBRE C. Internet traffic classification based on flows' statistical properties with machine learning[EB/OL].[2016-01-04]. http://xueshu.baidu.com/s?wd=paperuri%3A%28d28202f939e15174bab4e79108ffc9c4%29&filter=sc_long_sign&tn=SE_xueshusource_2kduw22v&sc_vurl=http%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1002%2Fnem.1929%2Fabstract&ie=utf-8&sc_us=16572060722905074141. [15] 刘建伟,刘媛,罗雄麟.半监督学习方法[J].计算机学报,2015,38(8):1592-1617.(LIU J W, LIU Y, LUO X L. Semi-supervised learning method[J]. Chinese Journal of Computers, 2015, 38(8):1592-1617.) [16] BAKHSHI T, GHITA B. User traffic profiling[C]//Proceedings of the 2015 Internet Technologies and Applications. Piscataway, NJ:IEEE, 2015:91-97. [17] ANGELOV P, KANGIN D, ZHOU X, et al. Symbol recognition with a new autonomously evolving classifier autoclass[C]//Proceedings of the 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems. Piscataway, NJ:IEEE, 2014:1-7. [18] 徐鹏,林森.基于C4.5决策树的流量分类方法[J].软件学报,2009,20(10):2692-2704.(XU P, LIN S. Traffic classification method based on C4.5 decision tree[J]. Journal of Software, 2009, 20(10):2692-2704.) [19] WANG Y, XIANG Y, ZHANG J. Network traffic clustering using random forest proximities[C]//Proceedings of the 2013 IEEE International Conference on Communications. Piscataway, NJ:IEEE, 2013:2058-2062. [20] 屠金路,金瑜,王庭照.bootstrap法在合成分数信度区间估计中的应用[J].心理科学,2005,28(5):1199-1200.(TU J L, JIN Y, WANG T Z. The application of bootstrap method in the estimation of synthetic fractional reliability[J]. Psychological Science, 2005, 28(5):1199-1200.) [21] 汪中,刘贵全,陈恩红.一种优化初始中心点的K-means算法[J].模式识别与人工智能,2009,22(2):299-304.(WANG Z, LIU G Q, CHEN E H. K-means algorithm for optimizing initial center point[J]. Pattern Recognition and Artificial Intelligence, 2009, 22(2):299-304.)