基于底层虚拟机的标识符混淆方法

doi:10.11772/j.issn.1001-9081.2021071166

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (8): 2540-2547.DOI: 10.11772/j.issn.1001-9081.2021071166

• 计算机软件技术 • 上一篇

基于底层虚拟机的标识符混淆方法

田大江, 李成扬, 黄天波, 文伟平()

北京大学软件与微电子学院，北京 102600

收稿日期:2021-07-07 修回日期:2021-09-14 接受日期:2021-09-18 发布日期:2021-10-11 出版日期:2022-08-10
通讯作者: 文伟平
作者简介:田大江（1997—），男，湖北黄冈人，CCF会员，主要研究方向：代码混淆；
李成扬（1996—），男，山东临沂人，硕士研究生，主要研究方向：代码混淆；
黄天波（1997—），男，河北邯郸人，硕士研究生，主要研究方向：网络空间安全、恶意代码检测、代码混淆；
文伟平（1976—），男，湖南益阳人，教授，博士，主要研究方向：系统与网络安全、大数据与云安全、智能计算安全。
基金资助:
华为-北京大学校企合作项目(2020001763)

Identifier obfuscation method based on low level virtual machine

Dajiang TIAN, Chengyang LI, Tianbo HUANG, Weiping WEN()

School of Software and Microelectronics，Peking University，Beijing 102600，China

Received:2021-07-07 Revised:2021-09-14 Accepted:2021-09-18 Online:2021-10-11 Published:2022-08-10
Contact: Weiping WEN
About author:TIAN Dajiang， born in 1997. His research interests include code obfuscation.
LI Chengyang， born in 1996， M. S. candidate. His research interests include code obfuscation.
HUANG Tianbo， born in 1997， M. S. candidate. His research interests include cyberspace security， malicious code detection， code obfuscation.
WEN Weiping， born in 1976， Ph. D.， professor. His research interests include system and network security， big data and cloud security， intelligent computing security.
Supported by:
Huawei-Peking University School-Enterprise Cooperation Project(2020001763)

摘要/Abstract

摘要：

针对现有代码混淆仅限于某一特定编程语言或某一平台，并不具有广泛性和通用性，以及控制流混淆和数据混淆会引入额外开销的问题，提出一种基于底层虚拟机（LLVM）的标识符混淆方法。该方法实现了4种标识符混淆算法，包括随机标识符算法、重载归纳算法、异常标识符算法以及高频词替换算法，同时结合这些算法，设计新的混合混淆算法。所提混淆方法首先在前端编译得到的中间文件中候选出符合混淆条件的函数名，然后使用具体的混淆算法对这些函数名进行处理，最后使用具体的编译后端将混淆后的文件转换为二进制文件。基于LLVM的标识符混淆方法适用于LLVM支持的语言，不影响程序正常功能，且针对不同的编程语言，时间开销在20%内，空间开销几乎无增加；同时程序的平均混淆比率在77.5%，且相较于单一的替换算法和重载算法，提出的混合标识符算法理论分析上可以提供更强的隐蔽性。实验结果表明，所提方法具有性能开销小、隐蔽性强、通用性广的特点。

关键词: 软件保护, 代码混淆, 标识符混淆, 底层虚拟机, 混淆方法

Abstract:

Most of the existing code obfuscation solutions are limited to a specific programming language or a platform， which are not widespread and general. Moreover， control flow obfuscation and data obfuscation introduce additional overhead. Aiming at the above problems， an identifier obfuscation method was proposed based on Low Level Virtual Machine （LLVM）. Four identifier obfuscation algorithms were implemented in the method， including random identifier algorithm， overload induction algorithm， abnormal identifier algorithm， and high-frequency word replacement algorithm. At the same time， a new hybrid obfuscation algorithm was designed by combining these algorithms. In the proposed method， firstly， in the intermediate files compiled by the front-ends， the function names， which met the obfuscation criteria， were selected. Secondly， these function names were processed by using specific obfuscation algorithms. Finally， the obfuscated files were transformed into binary files by using specific compilation back-ends. The identifier obfuscation method based on LLVM is suitable for the languages supported by LLVM and does not affect the normal functions of the program. For different programming languages， the time overhead is within 20% and the space overhead hardly increases. At the same time， the average confusion ratio of the program is 77.5%， and compared with the single replacement algorithm and overload algorithm， the proposed mixed identifier algorithm can provide stronger concealment in theoretical analysis. Experimental results show that the proposed method has the characteristics of low-performance overhead， strong concealment， and wide versatility.

Key words: software protection, code obfuscation, identifier obfuscation, Low Level Virtual Machine (LLVM), obfuscation method

中图分类号:

TP312

田大江, 李成扬, 黄天波, 文伟平. 基于底层虚拟机的标识符混淆方法[J]. 计算机应用, 2022, 42(8): 2540-2547.

Dajiang TIAN, Chengyang LI, Tianbo HUANG, Weiping WEN. Identifier obfuscation method based on low level virtual machine[J]. Journal of Computer Applications, 2022, 42(8): 2540-2547.

图/表 7

图1 LLVM三段式设计

Fig. 1 LLVM three-stage design

图2 LLVM编译器架构

Fig. 2 LLVM compiler architecture

表1 标识符混淆算法示例

Tab. 1 Identifier obfuscation algorithm examples

原函数	处理结果
原函数	随机标识符算法	高频词替换算法	异常标识符算法1	异常标识符算法2	重载归纳算法
height（Node*）	ieqitsHnYrg	took	export	______	newNode（Node*）
max（int， int）	e42sWLoECD9	london	short	________	newNode（int， int）
newNode（int）	m34nK7081V0	tech	operator	__________	newNode（int）
leftRotate（Node*）	q7U24yHN4W2	later	long	____________	leftRotate（Node*）
insert（Node*， int）	msauw40i8tj	rich	auto	_______________	leftRotate（Node*， int）

图3 混合标识符混淆算法过程

Fig. 3 Mixed identifier obfuscation algorithm process

图4 性能分析

Fig. 4 Performance analysis

图5 不同混淆算法的混淆效果图

Fig. 5 Obfuscation effect diagram of different identifier obfuscation algorithms

图6 混合混淆算法的混淆比例

Fig. 6 Obfuscation ratio of mixed obfuscation algorithm

参考文献 36

1	P C VAN OORSCHOT. Revisiting software protection ［C］// Proceedings of the 2003 International Conference on Information Security， LNCS 2851. Berlin： Springer， 2003： 1-13.
2	王朝坤，付军宁，王建民，等.软件防篡改技术综述［J］.计算机研究与发展， 2011， 48（6）： 923-933.
	WANG C K， FU J N， WANG J M， et al. Survey of software tamper proofing technique［J］. Journal of Computer Research and Development， 2011， 48（6）： 923-933.
3	RAJBA P， MAZURCZYK W. Data hiding using code obfuscation ［C］// Proceedings of the 16th International Conference on Availability， Reliability and Security. New York： ACM， 2021： No.75. 10.1145/3465481.3470086
4	LIU Z W， ZHANG Z， LIU H， et al. Web service active defense mechanism based on automated software diversity ［C］// Proceedings of the 2020 International Conference on Computer Engineering and Application. Piscataway： IEEE， 2020： 241-249. 10.1109/iccea50009.2020.00060
5	SHAMLAN M H BIN， ALAIDAROOS A S， MERDHAH M H BIN， et al. Experimental evaluation of the obfuscation techniques against reverse engineering［M］// SAEED F， AL-HADHRAMI T， MOHAMMED F， et al. Advances on Smart and Soft Computing， AISC 1188. Singapore： Springer， 2021： 383-390.
6	SCHRITTWIESER S， KATZENBEISSER S， KINDER J， et al. Protecting software through obfuscation： can it keep pace with progress in code analysis？［J］. ACM Computing Surveys， 2017， 49（1）： No.4.
7	BARAK B， GOLDREICH O， IMPAGLIAZZO R， et al. On the （im） possibility of obfuscating programs［J］. Journal of the ACM， 2012， 59（2）： No.6. 10.1145/2160158.2160159
8	COLLBERG C， THOMBORSON C， LOW D. A taxonomy of obfuscating transformations： technical report #148［R］. Auckland： The University of Auckland， 1997.
9	CANFORA G， DI PENTA M， CERULO L. Achievements and challenges in software reverse engineering［J］. Communications of the ACM， 2011， 54（4）： 142-151. 10.1145/1924421.1924451
10	LATTNER C. The LLVM compiler infrastructure［EB/OL］. ［2021-07-01］. .
11	LATTNER C. What source languages are supported？［EB/OL］. ［2021-07-01］. . 10.1007/11532378_2
12	杨宇波.代码混淆模型研究［D］.北京：北京邮电大学， 2015： 19-46.
	YANG Y B. Research on code obfuscation model［D］. Beijing： Beijing University of Posts and Telecommunications， 2015： 19-46
13	COHEN F B. Operating system protection through program evolution［J］. Computers and Security， 1993， 12（6）： 565-584. 10.1016/0167-4048(93)90054-9
14	COLLBERG C， THOMBORSON C， LOW D. Manufacturing cheap， resilient， and stealthy opaque constructs ［C］// Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. New York： ACM， 1998： 184-196. 10.1145/268946.268962
15	COLLBERG C， THOMBORSON C， LOW D. Breaking abstractions and unstructuring data structures ［C］// Proceedings of the 1998 International Conference on Computer Language. Piscataway： IEEE， 1998： 28-38. 10.1145/268946.268962
16	CHOW S， EISEN P， JOHNSON H， et al. White-box cryptography and an AES implementation ［C］// Proceedings of the 2002 International Workshop on Selected Areas in Cryptography， LNCS 2595. Berlin： Springer， 2003： 250-270.
17	LINN C， DEBRAY S. Obfuscation of executable code to improve resistance to static disassembly ［C］// Proceedings of the 10th ACM Conference on Computer and Communications Security. New York： ACM， 2003： 290-299. 10.1145/948109.948149
18	ROUNDY K A， MILLER B P. Binary-code obfuscations in prevalent packer tools［J］. ACM Computing Surveys， 2013， 46（1）： No.4. 10.1145/2522968.2522972
19	MAJUMDAR A， DRAPE S， THOMBORSON C. Slicing obfuscations： design， correctness， and evaluation ［C］// Proceedings of the 2007 ACM Workshop on Digital Rights Management. New York： ACM， 2007： 70-81. 10.1145/1314276.1314290
20	JUNOD P， RINALDINI J， WEHRLI J， et al. Obfuscator-LLVM — software protection for the masses ［C］// Proceedings of 2015 IEEE/ACM 1st International Workshop on Software Protection. Piscataway： IEEE， 2015： 3-9. 10.1109/spro.2015.10
21	GARG S， GENTRY C， HALEVI S， et al. Candidate indistinguishability obfuscation and functional encryption for all circuit ［C］// Proceedings of the IEEE 54th Annual Symposium on Foundations of Computer Science. Piscataway： IEEE， 2013： 40-49. 10.1109/focs.2013.13
22	ANCKAERT B， MADOU M， DE SUTTER B， et al. Program obfuscation： a quantitative approach ［C］// Proceedings of the 2007 ACM Workshop on Quality of Protection. New York： ACM， 2007： 15-20. 10.1145/1314257.1314263
23	ISOBE Y， TAMADA H. Design and evaluation of the de-obfuscation method against the identifier renaming methods［J］. International Journal of Networked and Distributed Computing， 2018， 6（4）： 232-238. 10.2991/ijndc.2018.4.6.6
24	霍建雷.用于Java软件保护的代码混淆技术研究与实现［D］.西安：西北大学， 2009： 24-48.
	HUO J L. Research and implementation of code obfuscation techniques in Java software protection［D］. Xi’an： Northwest University， 2009： 24-48
25	CECCATO M， DI PENTAM， NAGRA J， et al. The effectiveness of source code obfuscation： an experimental assessment ［C］// Proceedings of the IEEE 17th International Conference on Program Comprehension. Piscataway： IEEE， 2009： 178-187. 10.1109/icpc.2009.5090041
26	AL-HAKIMI A M H， SULTAN A B M， ABDUL GHANI A A， et al. Hybrid obfuscation technique to protect source code from prohibited software reverse engineering［J］. IEEE Access， 2020， 8： 187326-187342. 10.1109/access.2020.3028428
27	CIMATO S， DE SANTIS A， FERRARO PETRILLO U. Overcoming the obfuscation of Java programs by identifier renaming［J］. Journal of Systems and Software， 2005， 78（1）： 60-72. 10.1016/j.jss.2004.11.019
28	RACORDON D. From ASTs to machine code with LLVM ［C］// Companion Proceedings of the 5th International Conference on the Art， Science， and Engineering of Programming. New York： ACM， 2021： 68-76. 10.1145/3464432.3464777
29	LATTNER C. LLVM language reference manual［EB/OL］. ［2021-07-01］. .
30	潘雁.基于虚拟机框架的代码混淆技术研究［D］.郑州：战略支援部队信息工程大学， 2018： 11-24.
	PAN Y. Research on code obfuscation based on virtual machine-based code protection［D］. Zhengzhou： Information Engineering University， 2018： 11-24
31	JAFFE A， LACOMIS J， SCHWARTZ E J， et al. Meaningful variable names for decompiled code： a machine translation approach ［C］// Proceedings of the ACM/IEEE 26th Conference on Program Comprehension. New York： ACM， 2018： 20-30. 10.1145/3196321.3196330
32	KASHIWABARA Y， ONIZUKA Y， ISHIO T， et al. Recommending verbs for rename method using association rule mining ［C］// Proceedings of 2014 Software Evolution Week-IEEE Conference on Software Maintenance， Reengineering， and Reverse Engineering. Piscataway： IEEE， 2014： 323-327. 10.1109/csmr-wcre.2014.6747186
33	LI G J， LIU H， NYAMAWE A S. A survey on renamings of software entities［J］. ACM Computing Surveys， 2020， 53（2）： No.41. 10.1145/3379443
34	KUMAR R， VAISHAKH A R E. Detection of obfuscation in Java malware［J］. Procedia Computer Science， 2016， 78： 521-529. 10.1016/j.procs.2016.02.097
35	张越.基于代码混淆的软件保护方案研究与设计［D］.成都：电子科技大学， 2019： 61-71.
	ZHANG Y. Research and design of a software protection scheme based on code obfuscation［D］. Chengdu： University of Electronic Science and Technology of China， 2019： 61-71
36	LACOMIS J， YIN P C， SCHWARTZ E， et al. DIRE： a neural approach to decompiled identifier naming ［C］// Proceedings of 34th IEEE/ACM International Conference on Automated Software Engineering. Piscataway： IEEE， 2019： 628-639. 10.1109/ase.2019.00064

[1]	肖顺陶, 周安民, 刘亮, 贾鹏, 刘露平. 基于符号执行的底层虚拟机混淆器反混淆框架[J]. 计算机应用, 2018, 38(6): 1745-1750.
[2]	王岩, 黄章进, 顾乃杰. 基于同余方程和改进的压扁控制流的混淆算法[J]. 计算机应用, 2017, 37(6): 1803-1807.
[3]	房鼎益, 党舒凡, 王怀军, 董浩, 张凡. 具有时间多样性的JavaScript代码保护方法[J]. 计算机应用, 2015, 35(1): 72-76.
[4]	王蕊杨秋翔陈够喜马巧梅. 基于分存策略的软件保护博弈模型[J]. 计算机应用, 2013, 33(09): 2525-2528.
[5]	姜子峰曾光裕王炜高洪博. BIOS陷门实现机理及检测技术研究[J]. 计算机应用, 2013, 33(02): 455-459.
[6]	徐钦桂刘桂雄高富荣. 面向测量应用的软件保护模型[J]. 计算机应用, 2011, 31(04): 970-974.
[7]	周立国熊小兵. 基于自封闭代码块的软件保护技术研究[J]. 计算机应用, 2009, 29(3): 817-822.
[8]	黄俊，许娟，左洪福. 基于RSA算法的注册码软件加密保护[J]. 计算机应用, 2005, 25(09): 2080-2082.

基于底层虚拟机的标识符混淆方法

Identifier obfuscation method based on low level virtual machine

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 36

相关文章 8

编辑推荐

Metrics