基于词法特征的恶意域名快速检测算法

doi:10.11772/j.issn.1001-9081.2018051118

计算机应用 ›› 2019, Vol. 39 ›› Issue (1): 227-231.DOI: 10.11772/j.issn.1001-9081.2018051118

基于词法特征的恶意域名快速检测算法

赵宏, 常兆斌, 王乐

兰州理工大学计算机与通信学院, 兰州 730050

收稿日期:2018-05-30 修回日期:2018-08-01 出版日期:2019-01-10 发布日期:2019-01-21
通讯作者: 常兆斌
作者简介:赵宏(1971-),男,甘肃西和人,教授,博士,CCF会员,主要研究方向:并行与分布式处理、自然语言处理、深度学习;常兆斌(1995-),男,甘肃会宁人,硕士研究生,CCF会员,主要研究方向:自然语言处理、空间网络安全、深度学习;王乐(1994-),女,甘肃玉门人,硕士研究生,CCF会员,主要研究方向:自然语言处理、深度学习、情感分析。
基金资助:
国家自然科学基金资助项目（51668043）；赛尔网络下一代互联网技术创新项目（NG1120160311，NG1120160112）。

Fast malicious domain name detection algorithm based on lexical features

ZHAO Hong, CHANG Zhaobin, WANG Le

School of Computer and Communication, Lanzhou University of Technology, Lanzhou Gansu 730050, China

Received:2018-05-30 Revised:2018-08-01 Online:2019-01-10 Published:2019-01-21
Supported by:
This work is partially supported by the National Natural Science Foundation of China (51668043), the CERNET Innovation Project (NGII20160311, NGII20160112).

摘要/Abstract

摘要： 针对互联网中恶意域名攻击事件频发，现有域名检测方法实时性不强的问题，提出一种基于词法特征的恶意域名快速检测算法。该算法根据恶意域名的特点，首先将所有待测域名按照长度进行正则化处理后赋予权值；然后利用聚类算法将待测域名划分成多个小组，并利用改进的堆排序算法按照组内权值总和计算各域名小组优先级，根据优先级降序依次计算各域名小组中每一域名与黑名单上域名之间的编辑距离；最后依据编辑距离值快速判定恶意域名。算法运行结果表明，基于词法特征的恶意域名快速检测算法与单一使用域名语义和单一使用域名词法的恶意域名检测算法相比，准确率分别提高1.7%与2.5%，检测速率分别提高13.9%与6.8%，具有更高的准确率和实时性。

关键词: 恶意域名, 词法特征, 检测算法, 编辑距离, 实时性

Abstract: Aiming at the problem that malicious domain name attacks frequently occur on the Internet and existing detection methods are not effective enough in performance of real time, a fast malicious domain name detection algorithm based on lexical features was proposed. According to characteristics of malicious domain name, all domain names to be tested were firstly normalized according to their lengths and the weights were given to them in the algorithm. Then a clustering algorithm was used to divide domain names to be tested into several groups, and the priority of each domain group was calculated by the improved heap sorting algorithm according to the sum of weights in group, the editing distance between each domain name in each domain name group and the domain name on blacklist was calculated in turn. Finally, malicious domain name was quickly determined according to the editing distance value. The running results of algorithm show that compared with the malicious domain name detection algorithm of only using domain name semantics and the algorithm of only using domain name lexical features, the accuracy of fast malicious domain name detection algorithm based on lexical features is increased by 1.7% and 2.5% respectively, the detection rate is increased by 13.9% and 6.8% respectively. The proposed algorithm has higher accuracy and performance of real-time.

Key words: malicious domain name, lexical feature, detection algorithm, editing distance, performance of real time

中图分类号:

赵宏, 常兆斌, 王乐. 基于词法特征的恶意域名快速检测算法[J]. 计算机应用, 2019, 39(1): 227-231.

ZHAO Hong, CHANG Zhaobin, WANG Le. Fast malicious domain name detection algorithm based on lexical features[J]. Journal of Computer Applications, 2019, 39(1): 227-231.

参考文献

[1] 网络安全信息与动态周报.第13期互联网安全威胁报告[EB/OL].[2018-04-01]. http://www.cert.org.cn/publish/main/44/2018/20180404150414268888501/20180404150414268888501_201html.(National Internet Emergency Center. 13th Internet security threat report[EB/OL].[2018-04-01]. http://www.cert.cn./publish/main/44/20180404150414268888501/20180404150414268888501_.html.)
[2] WANG T S, LIN H T, CHENG W T, et al. DBod:clustering and detecting DGA-based botnets using DNS traffic analysis[J]. Computers & Security, 2016, 64:1-15.
[3] 牛伟纳,张小松,孙恩博,等.基于流相似性的两阶段P2P僵尸网络检测方法[J].电子科技大学学报,2017,46(6):902-906.(NIU W N, ZHANG X S, SUN E B, et al. Two-stage peer-to-peer zombie network detection method based on flow similarity[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(6):902-906.)
[4] POMOROVA O, SAVENKO O, LYSENKO S, et al. A technique for the botnet detection based on DNS-traffic analysis[C]//Proceedings of the 22nd International Conference on Computer Networks. Berlin:Springer, 2015:127-138.
[5] YU B, OLUMOFIN F, SMITH L, et al. Behavior analysis based DNS tunneling detection and classification with big data technologies[C]//Proceedings of the 2016 International Conference on Internet of Things and Big Data. Setubal:SciTePress, 2016:284-290.
[6] PERDISCI R, CORONA I, DAGON D, et al. Detecting malicious flux service networks through passive analysis of recursive DNS traces[C]//Proceedings of the 25th Computer Security Applications Conference. Washington, DC:IEEE Computer Society, 2009:311-320.
[7] 张维维,龚俭,刘茜,等.基于词素特征的轻量级域名检测算法[J].软件学报,2016,27(9):2348-2364.(ZHANG W W, GONG J, LIU Q, et al. Lightweight domain name detection algorithm based on morpheme features[J]. Journal of Software, 2016, 27(9):2348-2364.
[8] 黄诚,刘嘉勇,刘亮,等.基于上下文语义的恶意域名语料提取模型研究[J].计算机工程与应用,2018,54(9):101-108.(HUANG C, LIU J Y, LIU L, et al. Research on the extraction model of malicious domain name corpus based on context semantics[J]. Computer Engineering and Applications, 2018, 54(9):101-108.)
[9] WANG W, SHIRLEY K. Breaking bad:detecting malicious domains using word segmentation[J]. ArXiv Preprint, 2015, 2015:1506.04111.
[10] 张洋,柳厅文,沙泓州,等.基于多元属性特征的恶意域名检测[J].计算机应用,2016,36(4):941-944.(ZHANG Y, LIU T W, SHA H Z, et al. Detection of malicious domain names based on multivariate attribute features[J]. Journal of Computer Applications, 2016, 36(4):941-944.)
[11] 刘爱江,黄长慧,胡光俊.基于改进神经网络算法的木马控制域名检测方法[J].电信科学,2014,30(7):39-42.(LIU A J, HUANG C H, HU G J. A method of Trojan control domain name detection based on improved neural network algorithm[J]. Telecommunications Science, 2014, 30(7):39-42.)
[12] TRUONG D-T, CHENG G, AHMAD J, et al. Detecting DGA-based botnet with DNS traffic analysis in monitored network[J]. Journal of Internet Technology, 2016, 17(2):217-230.
[13] 左晓军,董立勉,曲武.基于域名系统流量的Fast-Flux僵尸网络检测方法[J].计算机工程,2017,43(9):185-193.(ZUO X J, DONG L M, QU W. Fast-Flux zombie network detection based on domain name system traffic[J]. Computer Engineering, 2017, 43(9):185-193.)
[14] 周昌令,栾兴龙,肖建国.基于深度学习的域名查询行为向量空间嵌入[J].通信学报,2016,37(3):165-174.(ZHOU C L, LUAN X L, XIAO J G. Domain name query behavior vector space embedding based on depth learning[J]. Journal on Communications, 2016, 37(3):165-174.)
[15] KHALIL I, YU T, GUAN B. Discovering malicious domains through passive DNS data graph analysis[C]//Proceedings of the 11th ACM Asia Conference on Computer and Communications Security. New York:ACM, 2016:663-674.
[16] 周维柏,李蓉.基于关联规则挖掘的集中式僵尸网络检测[J].兰州理工大学学报,2016,42(6):109-113.(ZHOU W B, LI R. Centralized zombie network detection based on association rules mining[J]. Journal of Lanzhou University of Technology, 2016, 42(6):109-113.)
[17] 周勇林,由林麟,张永铮.基于命名及解析行为特征的异常域名检测方法[J].计算机工程与应用,2011,47(20):50-52.(ZHOU Y L, YOU L L, ZHANG Y Z. An anomaly domain name detection method based on naming and analytic behavior features[J]. Computer Engineering and Applications, 2011, 47(20):50-52.)
[18] 陈春萍.基于SVM与AdaBoost组合的分类算法研究[D].西安:西安电子科技大学,2012.(CHEN C P. Research on classification algorithm based on SVM and AdaBoost combination[D]. Xi'an:Xidian University, 2012.)
[19] ZHANG W. Relief feature selection and parameter optimization for support vector machine based on mixed kernel function[J/OL]. International Journal of Performability Engineering, 2018, 14(2)[2018-02-20]. http://www.ijpe-online.com/relief-feature-selection-and-parameter-optimization-for-support-vector-machine-based-on-mixed-kernel-function.html#axzz5TzKru9vC.
[20] Malware domain list. Malware domain list[EB/OL].[2018-05-08]. http://www.malwaredomainlist.com.php.
[21] 罗文塽,曹天杰.基于非用户操作序列的恶意软件检测方法[J].计算机应用,2018,38(1):56-60.(LUO W S, CAO T J. A malicious software detection method based on non-user operation sequence[J]. Journal of Computer Applications, 2018, 38(1):56-60.)
[22] Alexa Top Global Sites. Alexa top global sites[EB/OL].[2018-05-08]. http://www.alexa.com/topsites.

基于词法特征的恶意域名快速检测算法

Fast malicious domain name detection algorithm based on lexical features

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	刘晓光, 靳少康, 韦子辉, 梁铁, 王洪瑞, 刘秀玲. 基于阈值和极端随机树的实时跌倒检测方法[J]. 计算机应用, 2021, 41(9): 2761-2766.
[2]	郑思诚, 孔令华, 游通飞, 易定容. 动态环境下基于深度学习的语义SLAM算法[J]. 计算机应用, 2021, 41(10): 2945-2951.
[3]	朱怡, 宁振虎, 周艺华. 基于视觉特征的仿冒域名轻量级检测技术[J]. 计算机应用, 2020, 40(8): 2279-2285.
[4]	韩鑫, 余永维, 杜柳青. 基于改进单次多框检测算法的机器人抓取系统[J]. 计算机应用, 2020, 40(8): 2434-2440.
[5]	张皓然, 王学渊, 李小霞. 基于自适应阈值活动语音检测和最小均方误差对数谱幅度估计的低信噪比降噪算法[J]. 计算机应用, 2020, 40(6): 1763-1768.
[6]	孙建军, 徐岩. 基于加权改进模糊C均值聚类的欠定混合矩阵估计[J]. 计算机应用, 2020, 40(6): 1769-1773.
[7]	石昌友, 王美丽, 刘欣然, 黄慧丽, 周德强, 邓干然. 基于机器视觉的不同类型甘蔗茎节识别[J]. 计算机应用, 2019, 39(4): 1208-1213.
[8]	付久鹏, 曾国辉, 黄勃, 方志军. 基于双向快速探索随机树的狭窄通道路径规划[J]. 计算机应用, 2019, 39(10): 2865-2869.
[9]	徐林, 范昕炜. 基于改进遗传算法的餐厅服务机器人路径规划[J]. 计算机应用, 2017, 37(7): 1967-1971.
[10]	柴恩惠, 智敏. 融合分支定界的可变形部件模型的行人检测[J]. 计算机应用, 2017, 37(7): 2003-2007.
[11]	李卓, 刘洁瑜, 李辉, 周小刚, 李维鹏. 基于ORB-LATCH的特征检测与描述算法[J]. 计算机应用, 2017, 37(6): 1759-1762.
[12]	乔通, 赵卓峰, 丁维龙. 面向套牌甄别的流式计算系统[J]. 计算机应用, 2017, 37(1): 153-158.
[13]	张靖, 李小鹏, 王衡军, 李俊全, 郁滨. 基于攻击规划图的实时报警关联方法[J]. 计算机应用, 2016, 36(6): 1538-1543.
[14]	侯荣波, 康文雄, 房育勋, 黄荣恩, 徐伟钊. 基于时间上下文跟踪学习检测的指尖跟踪方法[J]. 计算机应用, 2016, 36(5): 1371-1377.
[15]	张洋, 柳厅文, 沙泓州, 时金桥. 基于多元属性特征的恶意域名检测[J]. 计算机应用, 2016, 36(4): 941-944.