Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2160-2167.DOI: 10.11772/j.issn.1001-9081.2023070992
• Computer software technology • Previous Articles Next Articles
Haixiang HUANG1, Shuanghe PENG1(), Ziyu ZHONG2
Received:
2023-07-24
Revised:
2023-09-13
Accepted:
2023-09-21
Online:
2023-10-26
Published:
2024-07-10
Contact:
Shuanghe PENG
About author:
HUANG Haixiang, born in 1999, M. S. candidate. His research interests include binary reverse analysis, vulnerability mining.Supported by:
通讯作者:
彭双和
作者简介:
黄海翔(1999—),男,江西九江人,硕士研究生,主要研究方向:二进制逆向分析、漏洞挖掘;基金资助:
CLC Number:
Haixiang HUANG, Shuanghe PENG, Ziyu ZHONG. Binary code identification based on user system call sequences[J]. Journal of Computer Applications, 2024, 44(7): 2160-2167.
黄海翔, 彭双和, 钟子煜. 基于用户系统调用序列的二进制代码识别[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2160-2167.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023070992
%rax | 系统调用名 | %rdi | %rsi | %rdx | %r10 | %r8 | %r9 |
---|---|---|---|---|---|---|---|
0 | sys_read | fd | 0 | 1 | 0 | 0 | 0 |
1 | sys_write | fd | 0 | 1 | 0 | 0 | 0 |
2 | sys_open | 0 | 1 | 1 | 0 | 0 | 0 |
3 | sys_close | fd | 0 | 0 | 0 | 0 | 0 |
Tab. 1 Valid parameter table (excerpt)
%rax | 系统调用名 | %rdi | %rsi | %rdx | %r10 | %r8 | %r9 |
---|---|---|---|---|---|---|---|
0 | sys_read | fd | 0 | 1 | 0 | 0 | 0 |
1 | sys_write | fd | 0 | 1 | 0 | 0 | 0 |
2 | sys_open | 0 | 1 | 1 | 0 | 0 | 0 |
3 | sys_close | fd | 0 | 0 | 0 | 0 | 0 |
编译环境 | Bindiff7 | Radiff2 | IMF-SIM | DeepBinDiff | UstraceDiff |
---|---|---|---|---|---|
平均值 | 34.9 | 37.4 | 74.0 | 24.1 | 98.0 |
gcc5 | 39.8 | 37.3 | 70.4 | 31.7 | 94.4 |
gcc9 | 44.3 | 40.6 | 24.7 | 94.4 | |
clang16 | 41.5 | 45.8 | 19.6 | 99.3 | |
ollvm(clang4) | 34.0 | 42.2 | 77.5 | 23.5 | 99.3 |
ollvm -bcf | 27.3 | 27.7 | 29.9 | 99.8 | |
ollvm -sub | 23.5 | 41.0 | 23.2 | 99.3 | |
ollvm -fla | 33.8 | 27.3 | 16.2 | 99.3 |
Tab. 2 Average homology scores with different optimization terms
编译环境 | Bindiff7 | Radiff2 | IMF-SIM | DeepBinDiff | UstraceDiff |
---|---|---|---|---|---|
平均值 | 34.9 | 37.4 | 74.0 | 24.1 | 98.0 |
gcc5 | 39.8 | 37.3 | 70.4 | 31.7 | 94.4 |
gcc9 | 44.3 | 40.6 | 24.7 | 94.4 | |
clang16 | 41.5 | 45.8 | 19.6 | 99.3 | |
ollvm(clang4) | 34.0 | 42.2 | 77.5 | 23.5 | 99.3 |
ollvm -bcf | 27.3 | 27.7 | 29.9 | 99.8 | |
ollvm -sub | 23.5 | 41.0 | 23.2 | 99.3 | |
ollvm -fla | 33.8 | 27.3 | 16.2 | 99.3 |
同源情况 | 编译器类型 | Bindiff7 | Radiff2 | DeepBinDiff | UstraceDiff |
---|---|---|---|---|---|
同源 | gcc | 93.5 | 69.4 | 69.3 | 97.8 |
clang | 87.6 | 81.6 | 59.2 | 99.8 | |
平均 | 90.6 | 75.5 | 64.3 | 98.8 | |
非同源 | gcc | 60.7 | 21.0 | 46.7 | 32.0 |
clang | 60.5 | 21.1 | 43.3 | 26.4 | |
平均 | 60.6 | 21.1 | 45.0 | 29.2 |
Tab. 3 Average homology scores with different compiler versions
同源情况 | 编译器类型 | Bindiff7 | Radiff2 | DeepBinDiff | UstraceDiff |
---|---|---|---|---|---|
同源 | gcc | 93.5 | 69.4 | 69.3 | 97.8 |
clang | 87.6 | 81.6 | 59.2 | 99.8 | |
平均 | 90.6 | 75.5 | 64.3 | 98.8 | |
非同源 | gcc | 60.7 | 21.0 | 46.7 | 32.0 |
clang | 60.5 | 21.1 | 43.3 | 26.4 | |
平均 | 60.6 | 21.1 | 45.0 | 29.2 |
评估方式 | Bindiff7 | Radiff2 | DeepBinDiff | UstraceDiff |
---|---|---|---|---|
两种编译器的同源程序 | 66.3 | 55.8 | 27.0 | 95.7 |
两种编译器的非同源程序 | 44.0 | 30.6 | 19.2 | 29.9 |
Tab. 4 Average homology scores with different types of compilers
评估方式 | Bindiff7 | Radiff2 | DeepBinDiff | UstraceDiff |
---|---|---|---|---|
两种编译器的同源程序 | 66.3 | 55.8 | 27.0 | 95.7 |
两种编译器的非同源程序 | 44.0 | 30.6 | 19.2 | 29.9 |
评估方式 | Bindiff7 | Radiff2 | IMF-SIM | DeepBinDiff | UstraceDiff |
---|---|---|---|---|---|
平均 | 60.2 | 54.7 | 64.7 | 55.3 | 99.8 |
无混淆 vs bcf混淆 | 48.4 | 40.9 | 51.3 | 57.2 | 99.8 |
无混淆 vs sub混淆 | 96.2 | 83.9 | 77.9 | 87.6 | 99.8 |
无混淆 vs fla混淆 | 36.0 | 39.3 | 64.9 | 21.1 | 99.8 |
Tab. 5 Average homology scores with different obfuscation patterns
评估方式 | Bindiff7 | Radiff2 | IMF-SIM | DeepBinDiff | UstraceDiff |
---|---|---|---|---|---|
平均 | 60.2 | 54.7 | 64.7 | 55.3 | 99.8 |
无混淆 vs bcf混淆 | 48.4 | 40.9 | 51.3 | 57.2 | 99.8 |
无混淆 vs sub混淆 | 96.2 | 83.9 | 77.9 | 87.6 | 99.8 |
无混淆 vs fla混淆 | 36.0 | 39.3 | 64.9 | 21.1 | 99.8 |
评估方式 | Bindiff7 | Radiff2 | DeepBinDiff | UstraceDiff |
---|---|---|---|---|
同名程序 | 68.5 | 27.1 | 58.6 | 72.5 |
非同名程序 | 57.4 | 20.1 | 43.2 | 24.0 |
Tab. 6 Average homology scores with different Coreutils versions
评估方式 | Bindiff7 | Radiff2 | DeepBinDiff | UstraceDiff |
---|---|---|---|---|
同名程序 | 68.5 | 27.1 | 58.6 | 72.5 |
非同名程序 | 57.4 | 20.1 | 43.2 | 24.0 |
1 | FLAKE H. Structural comparison of executable objects [C]// Proceedings of the 2004 International Conference on Detection of Intrusions and Malware & Vulnerability Assessment. Ulm: Gesellschaft für Informatik, 2004: 161-173. |
2 | DING S H H, FUNG B C M, CHARLAND P. Asm2Vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization [C]// Proceedings of the 2019 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2019: 472-489. |
3 | JIN X, PEI K, WON J Y, et al. SymLM: predicting function names in stripped binaries via context-sensitive execution-aware code embeddings [C]// Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2022: 1631-1645. |
4 | 王泰彦,潘祖烈,于璐,等.基于预训练汇编指令表征的二进制代码相似性检测方法[J].计算机科学, 2023, 50(4): 288-297. |
WANG T Y, PAN Z L, YU L, et al. Binary code similarity detection method based on pre-training assembly instruction representation [J]. Computer Science, 2023, 50(4): 288-297. | |
5 | 夏冰,庞建民,周鑫,等.二进制代码相似性搜索研究进展[J].计算机应用, 2022, 42(4): 985-998. |
XIA B, PANG J M, ZHOU X, et al. Research progress on binary code similarity search [J]. Journal of Computer Applications, 2022, 42(4): 985-998. | |
6 | HAQ I U, CABALLERO J. A survey of binary code similarity [J]. ACM Computing Surveys, 2021, 54(3): No. 51. |
7 | MARCELLI A, GRAZIANO M, UGARTE-PEDRERO X, et al. How machine learning is solving the binary function similarity problem [C]// Proceedings of the 31st USENIX Security Symposium. Berkley: USENIX Association, 2022: 2099-2116. |
8 | MINK J, BENKRAOUDA H, YANG L, et al. Everybody’s got ML, tell me what else you have: practitioners’ perception of ML-based security tools and explanations [C]// Proceedings of the 2023 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2023: 2068-2085. |
9 | WANG S, WU D. In-memory fuzzing for binary code similarity analysis [C]// Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. Piscataway: IEEE, 2017: 319-330. |
10 | KARGÉN U, SHAHMEHRI N. Towards robust instruction-level trace alignment of binary code [C]// Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering. Piscataway: IEEE, 2017: 342-352. |
11 | SALVADOR S, CHAN P. Toward accurate dynamic time warping in linear time and space [J]. Intelligent Data Analysis, 2007, 11(5): 561-580. |
12 | MING J, XU D, JIANG Y, et al. BinSim: trace-based semantic binary diffing via system call sliced segment equivalence checking [C]// Proceedings of the 26th USENIX Security Symposium. Berkley: USENIX Association, 2017: 253-270. |
13 | DAVID Y, PARTUSH N, YAHAV E. Similarity of binaries through re-optimization [C]// Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2017: 79-94. |
14 | Y-C JHI, WANG X, JIA X, et al. Value-based program characterization and its application to software plagiarism detection [C]// Proceedings of the 2011 33rd International Conference on Software Engineering. New York: ACM, 2011: 756-765. |
15 | TIAN Z, ZHENG Q, LIU T, et al. DKISB: dynamic key instruction sequence birthmark for software plagiarism detection [C]// Proceedings of the 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing. Piscataway: IEEE, 2013: 619-627. |
16 | TIAN Z, ZHENG Q, LIU T, et al. Software plagiarism detection with birthmarks based on dynamic key instruction sequences [J]. IEEE Transactions on Software Engineering, 2015, 41(12): 1217-1235. |
17 | DAVID Y, YAHAV E. Tracelet-based code search in executables [J]. ACM SIGPLAN Notices, 2014, 49(6): 349-360. |
18 | HU Y, ZHANG Y, LI J, et al. Cross-architecture binary semantics understanding via similar code comparison [C]// Proceedings of the 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering. Piscataway: IEEE, 2016: 57-67. |
19 | WANG X, Y-C JHI, ZHU S C, et al. Detecting software theft via system call based birthmarks [C]// Proceedings of the 2009 Annual Computer Security Applications Conference. Piscataway: IEEE, 2009: 149-158. |
20 | C-K LUK, COHN R, MUTH R, et al. Pin: building customized program analysis tools with dynamic instrumentation [J]. ACM SIGPLAN Notices, 2005, 40(6): 190-200. |
21 | CONDRET. Radare2 user guide [EB/OL]. (2020-03-03) [2023-04-10]. . |
22 | DUAN Y, LI X, WANG J, et al. DeepBinDiff: learning program-wide code representations for binary diffing [C/OL]// Proceedings of the 26th Network and Distributed System Security Symposium. Reston: Internet Society, 2020[2023-07-01]. . |
23 | KIM S-W, GIL J-M. Research paper classification systems based on TF-IDF and LDA schemes [J]. Human-centric Computing and Information Sciences, 2019, 9: No. 30. |
[1] | Jingwei LEI, Peng YI, Xiang CHEN, Liang WANG, Ming MAO. PDF document detection model based on system calls and data provenance [J]. Journal of Computer Applications, 2022, 42(12): 3831-3840. |
[2] | CAI Mengjuan, CHEN Xingshu, JIN Xin, ZHAO Cheng, YIN Mingyong. Paging-measurement method for virtual machine process code based on hardware virtualization [J]. Journal of Computer Applications, 2018, 38(2): 305-309. |
[3] | LI Jinjin, JIA Xiaoqi, DU Haichao, WANG Lipeng. Efficient virtualization-based approach to improve system availability [J]. Journal of Computer Applications, 2017, 37(4): 986-992. |
[4] | ZHAO Cheng, CHEN Xingshu, JIN Xin. Virtual machine file integrity monitoring based on hardware virtualization [J]. Journal of Computer Applications, 2017, 37(2): 388-391. |
[5] | ZHOU Min, ZHOU Anmin, LIU Liang, JIA Peng, TAN Cuijiang. Mining denial of service vulnerability in Android applications automatically [J]. Journal of Computer Applications, 2017, 37(11): 3288-3293. |
[6] | ZHOU Dengyuan, LI Qingbao, ZHANG Lei, KONG Weiliang. Windows clipboard operations monitoring based on virtual machine monitor [J]. Journal of Computer Applications, 2016, 36(2): 511-515. |
[7] | DAI Wei, LIU Zhi, LIU Yihe. Function pointer attack detection with address integrity checking [J]. Journal of Computer Applications, 2015, 35(2): 424-429. |
[8] | WU Ying JIANG Jian-hui. System call anomaly detection with least entropy length based on process traces [J]. Journal of Computer Applications, 2012, 32(12): 3439-3444. |
[9] | . Hierarchical method to analyze malware behavior [J]. Journal of Computer Applications, 2010, 30(4): 1048-1052. |
[10] | . Generation of system malicious behavior specification based on system call trace [J]. Journal of Computer Applications, 2010, 30(07): 1767-1770. |
[11] | . Research and implementation of layer access control technology based on Linux kernel driver [J]. Journal of Computer Applications, 2009, 29(09): 2369-2374. |
[12] | HE Zhi 何志 . Research of HSC-based hidden process detection technique [J]. Journal of Computer Applications, 2008, 28(7): 1772-1775. |
[13] | Wen-gang ZHAO Le-hai ZHONG Ya ZHANG Jin YANG Hai-yang ZOU. Application of fuzzy window Markov chain in IDS [J]. Journal of Computer Applications, 2008, 28(6): 1398-1400. |
[14] | . Design and implementation of intrusion detection system based on system-call [J]. Journal of Computer Applications, 2006, 26(9): 2137-2139. |
[15] | ;;. An Intrusion Detection Model Based on Immune and Rough Sets Theory [J]. Journal of Computer Applications, 2006, 26(5): 1077-1080. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||