Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (8): 2279-2285.DOI: 10.11772/j.issn.1001-9081.2019111952

• Cyber security • Previous Articles     Next Articles

Lightweight detection technology of typosquatting based on visual features

ZHU Yi, NING Zhenhu, ZHOU Yihua   

  1. Information Department, Beijing University of Technology, Beijing 100124, China
  • Received:2019-11-15 Revised:2020-01-10 Online:2020-08-10 Published:2020-06-29
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61971014), the Natural Science Foundation of Qinghai Province (2017-ZJ-912), the Qinghai Science and Technology Project (2018-ZJ-753), the CCF-NSFOCUS (2018003).

基于视觉特征的仿冒域名轻量级检测技术

朱怡, 宁振虎, 周艺华   

  1. 北京工业大学 信息学部, 北京 100124
  • 通讯作者: 宁振虎(1983-),男,北京人,讲师,博士,CCF会员,主要研究方向:信息安全、恶意代码防范、物联网安全,nzh41034@163.com
  • 作者简介:朱怡(1996-),女,湖北恩施人,硕士研究生,主要研究方向:信息安全、恶意域名检测、深度学习;周艺华(1969-),男,北京人,副教授,博士,主要研究方向:网络与信息安全、多媒体信息检索与内容安全、密码学、信息安全。
  • 基金资助:
    国家自然科学基金资助项目(61971014);青海省自然科学基金资助项目(2017-ZJ-912);青海省科技计划项目(2018-ZJ-753);CCF-绿盟科技“鲲鹏”科研基金资助项目(2018003)。

Abstract: Recently, botnets, domain name hijacking, phishing websites and other typosquatting attacks are more and more frequent, seriously threatening the security of society and individuals. Therefore, the typosquatting detection is an important part of network protection. The current typosquatting detections mainly focus on public domain names, and the detection methods are mainly based on edit distance which is difficult to fully reflect the visual characteristics of domain names. In addition, using the related information of the given domains for determination can help to increase the detection efficiency, but it also introduces a large additional cost. Based on this, a lightweight detection strategy only based on domain name strings was adopted for typosquatting detection. By comprehensively considering the influence of character locations, character similarities and operation types on the vision of domain names, the edit distance algorithm based on visual characteristics was proposed. According to the characteristics of typosquatting, firstly the domain names were preprocessed, then different weights were given to the characters according to their positions, character similarities and operation types, and finally, the typosquatting determination was performed by calculating the edit distance value. Experimental results show that compared with the detection method based on edit distance, the typosquatting lightweight detection method based on visual features has the F1 value increased by 5.98% and 13.56% respectively when the threshold value is 1 and 2, which proves that the proposed method has a good detection effect.

Key words: typosquatting, edit distance, visual similarity, detection algorithm, lightweight

摘要: 近年来,僵尸网络、域名挟持、钓鱼网站等仿冒域名攻击越发频繁,严重威胁着社会和个人的安全,因此仿冒域名检测已经成为网络防护的重要组成部分。当前的仿冒域名检测主要面向公共域名,检测方法以编辑距离为主,难以充分体现域名的视觉特征;此外利用域名相关信息进行判定虽然有助于提高检测效率,却会引入较大的额外开销。为此,考虑采用仅基于域名字符串的轻量级检测策略,并综合考虑字符位置、字符相似度和操作类型对域名视觉的影响,提出基于视觉特征的编辑距离算法。该算法根据仿冒域名的特点,先对域名进行预处理,然后按照字符位置、字符相似度及操作类型对字符赋予不同的权重,最后通过计算编辑距离值进行仿冒域名判定。实验结果表明,基于视觉特征的仿冒域名轻量级检测方法与基于编辑距离的判定方法相比,在阈值取1和2时,F1值分别提高了5.98%和13.56%,验证了该方法具有良好的检测效果。

关键词: 仿冒域名, 编辑距离, 视觉相似度, 检测算法, 轻量级

CLC Number: