计算机应用 ›› 2019, Vol. 39 ›› Issue (1): 227-231.DOI: 10.11772/j.issn.1001-9081.2018051118

• 网络空间安全 • 上一篇    下一篇

基于词法特征的恶意域名快速检测算法

赵宏, 常兆斌, 王乐   

  1. 兰州理工大学 计算机与通信学院, 兰州 730050
  • 收稿日期:2018-05-30 修回日期:2018-08-01 出版日期:2019-01-10 发布日期:2019-01-21
  • 通讯作者: 常兆斌
  • 作者简介:赵宏(1971-),男,甘肃西和人,教授,博士,CCF会员,主要研究方向:并行与分布式处理、自然语言处理、深度学习;常兆斌(1995-),男,甘肃会宁人,硕士研究生,CCF会员,主要研究方向:自然语言处理、空间网络安全、深度学习;王乐(1994-),女,甘肃玉门人,硕士研究生,CCF会员,主要研究方向:自然语言处理、深度学习、情感分析。
  • 基金资助:
    国家自然科学基金资助项目(51668043);赛尔网络下一代互联网技术创新项目(NG1120160311,NG1120160112)。

Fast malicious domain name detection algorithm based on lexical features

ZHAO Hong, CHANG Zhaobin, WANG Le   

  1. School of Computer and Communication, Lanzhou University of Technology, Lanzhou Gansu 730050, China
  • Received:2018-05-30 Revised:2018-08-01 Online:2019-01-10 Published:2019-01-21
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (51668043), the CERNET Innovation Project (NGII20160311, NGII20160112).

摘要: 针对互联网中恶意域名攻击事件频发,现有域名检测方法实时性不强的问题,提出一种基于词法特征的恶意域名快速检测算法。该算法根据恶意域名的特点,首先将所有待测域名按照长度进行正则化处理后赋予权值;然后利用聚类算法将待测域名划分成多个小组,并利用改进的堆排序算法按照组内权值总和计算各域名小组优先级,根据优先级降序依次计算各域名小组中每一域名与黑名单上域名之间的编辑距离;最后依据编辑距离值快速判定恶意域名。算法运行结果表明,基于词法特征的恶意域名快速检测算法与单一使用域名语义和单一使用域名词法的恶意域名检测算法相比,准确率分别提高1.7%与2.5%,检测速率分别提高13.9%与6.8%,具有更高的准确率和实时性。

关键词: 恶意域名, 词法特征, 检测算法, 编辑距离, 实时性

Abstract: Aiming at the problem that malicious domain name attacks frequently occur on the Internet and existing detection methods are not effective enough in performance of real time, a fast malicious domain name detection algorithm based on lexical features was proposed. According to characteristics of malicious domain name, all domain names to be tested were firstly normalized according to their lengths and the weights were given to them in the algorithm. Then a clustering algorithm was used to divide domain names to be tested into several groups, and the priority of each domain group was calculated by the improved heap sorting algorithm according to the sum of weights in group, the editing distance between each domain name in each domain name group and the domain name on blacklist was calculated in turn. Finally, malicious domain name was quickly determined according to the editing distance value. The running results of algorithm show that compared with the malicious domain name detection algorithm of only using domain name semantics and the algorithm of only using domain name lexical features, the accuracy of fast malicious domain name detection algorithm based on lexical features is increased by 1.7% and 2.5% respectively, the detection rate is increased by 13.9% and 6.8% respectively. The proposed algorithm has higher accuracy and performance of real-time.

Key words: malicious domain name, lexical feature, detection algorithm, editing distance, performance of real time

中图分类号: