Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (4): 985-998.DOI: 10.11772/j.issn.1001-9081.2021071267
Special Issue: CCF第36届中国计算机应用大会 (CCF NCCA 2021)
• The 36 CCF National Conference of Computer Applications (CCF NCCA 2020) • Next Articles
Bing XIA1,2, Jianmin PANG1(), Xin ZHOU1, Zheng SHAN1
Received:
2021-07-15
Revised:
2021-08-23
Accepted:
2021-08-30
Online:
2021-08-23
Published:
2022-04-10
Contact:
Jianmin PANG
About author:
XIA Bing, born in 1981, Ph. D. candidate, associate professor. His research interests include network cyberspace security, reverse engineering.Supported by:
通讯作者:
庞建民
作者简介:
夏冰(1981—),男,河南永城人,副教授,博士研究生,CCF会员,主要研究方向:网络安全、逆向工程基金资助:
CLC Number:
Bing XIA, Jianmin PANG, Xin ZHOU, Zheng SHAN. Research progress on binary code similarity search[J]. Journal of Computer Applications, 2022, 42(4): 985-998.
夏冰, 庞建民, 周鑫, 单征. 二进制代码相似性搜索研究进展[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 985-998.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021071267
名称 | 实现 | 优点 | 不足 |
---|---|---|---|
局部敏感哈希 | 对代码原始字节集合进行分片哈希 | 能够在向量空间中实现快速近似查找 | 精确度不高,受限于阈值设置 |
模糊哈希 | 对整个文件进行分片哈希 | 考虑了数据相似性,便于快速同源性分析 | 无法识别变化 |
可执行文件哈希 | 对可执行文件部分内容进行哈希 | 抗混淆能力强 | 错误率高 |
Tab. 1 Comparison of binary code similarity search schemes based on hash
名称 | 实现 | 优点 | 不足 |
---|---|---|---|
局部敏感哈希 | 对代码原始字节集合进行分片哈希 | 能够在向量空间中实现快速近似查找 | 精确度不高,受限于阈值设置 |
模糊哈希 | 对整个文件进行分片哈希 | 考虑了数据相似性,便于快速同源性分析 | 无法识别变化 |
可执行文件哈希 | 对可执行文件部分内容进行哈希 | 抗混淆能力强 | 错误率高 |
名称 | 实现 | 优点 | 不足 |
---|---|---|---|
N-Gram | 在指令序列上产生步长为n的有序的操作码序列 | 考虑了指令之间的空间、时间顺序 | 未考虑操作数,指令丢失部分语义 |
N-Perm | 在指令序列上产生步长为n的无序的操作码序列 | 能够捕获序列中的指令重新排序状况 | 未考虑操作数,指令丢失部分语义 |
指令哈希 | 指令归一化后进行哈希计算 | 比较速度快 | 归一化造成指令丢失部分语义 |
指令对齐 | 指令归一化后进行最长公共子序列计算 | 指令序列覆盖率高 | Trace的长度影响比较结果,针对基本块较少(少于3个)的函数影响较大 |
Tab. 2 Comparison of binary code similarity search schemes based on instruction sequence
名称 | 实现 | 优点 | 不足 |
---|---|---|---|
N-Gram | 在指令序列上产生步长为n的有序的操作码序列 | 考虑了指令之间的空间、时间顺序 | 未考虑操作数,指令丢失部分语义 |
N-Perm | 在指令序列上产生步长为n的无序的操作码序列 | 能够捕获序列中的指令重新排序状况 | 未考虑操作数,指令丢失部分语义 |
指令哈希 | 指令归一化后进行哈希计算 | 比较速度快 | 归一化造成指令丢失部分语义 |
指令对齐 | 指令归一化后进行最长公共子序列计算 | 指令序列覆盖率高 | Trace的长度影响比较结果,针对基本块较少(少于3个)的函数影响较大 |
名称 | 实现 | 优点 | 不足 |
---|---|---|---|
图相似性 | 图之间是否存在互相包含关系 | 相似的代码具有较高的图相似度 | 代价高 |
路径相似性比较 | 比较图之间的顶点序列是否相似 | 路径全覆盖 | 图的大小影响比较效果 |
图嵌入 | 将图用一个向量表示 | 计算速度快 | 可解释性不足 |
Tab. 3 Comparison of binary code similarity search schemes based on graph structure
名称 | 实现 | 优点 | 不足 |
---|---|---|---|
图相似性 | 图之间是否存在互相包含关系 | 相似的代码具有较高的图相似度 | 代价高 |
路径相似性比较 | 比较图之间的顶点序列是否相似 | 路径全覆盖 | 图的大小影响比较效果 |
图嵌入 | 将图用一个向量表示 | 计算速度快 | 可解释性不足 |
名称 | 特点 | 优点 | 缺点 |
---|---|---|---|
基于哈希 | 包括局部敏感哈希、模糊哈希、 可执行文件哈希等实现技术 | 比较速度快,检索匹配效率高 | 细微的变化造成高错误率 |
基于指令 | 包括N-Gram、N-Perm、指令哈希、指令对齐等实现技术 | 指令归一化为后期处理提供数据基础 | 指令归一化会造成指令精确语义缺失,进而影响语义比较 |
基于图结构 | 包括图相似性、路径相似性比较、图嵌入等实现技术 | 图结构携带更多的语义信息,鲁棒性好 | 图比较代价较高 |
Tab. 4 Comparison of binary code similarity search schemes based on grammar
名称 | 特点 | 优点 | 缺点 |
---|---|---|---|
基于哈希 | 包括局部敏感哈希、模糊哈希、 可执行文件哈希等实现技术 | 比较速度快,检索匹配效率高 | 细微的变化造成高错误率 |
基于指令 | 包括N-Gram、N-Perm、指令哈希、指令对齐等实现技术 | 指令归一化为后期处理提供数据基础 | 指令归一化会造成指令精确语义缺失,进而影响语义比较 |
基于图结构 | 包括图相似性、路径相似性比较、图嵌入等实现技术 | 图结构携带更多的语义信息,鲁棒性好 | 图比较代价较高 |
名称 | 特点 | 优点 | 缺点 |
---|---|---|---|
输入输出 行为 | 给定相同的输入如果能够产生相同的输出, 则认为基本块在功能上是相等的 | 能够证明语义相等 | 所有输入产生的输出都必须相等 |
符号公式 | 对每个寄存器、内存变量、 返回值用符号公式来表达 | 逻辑性强,能够证明语义相等 | 计算代价高,符号规范化会造成 语义的缺失 |
Tab. 5 Comparison of binary code similarity search schemes based on basic block semantics
名称 | 特点 | 优点 | 缺点 |
---|---|---|---|
输入输出 行为 | 给定相同的输入如果能够产生相同的输出, 则认为基本块在功能上是相等的 | 能够证明语义相等 | 所有输入产生的输出都必须相等 |
符号公式 | 对每个寄存器、内存变量、 返回值用符号公式来表达 | 逻辑性强,能够证明语义相等 | 计算代价高,符号规范化会造成 语义的缺失 |
名称 | 特点 | 优点 | 缺点 |
---|---|---|---|
嵌入比较 | 从训练数据中自动学习并产生特征向量, 可分为图嵌入和词嵌入 | 使比较对象具有丰富语义且 能实现特征自动抽取,计算效率高 | 嵌入学习到的代码片段特征 不具有解释性 |
机器学习比较 | 将相似性比较问题看作是分类或预测任务 | 代码越相似,分类预测越准确 | 无法识别进化,可解释性不足 |
Tab. 6 Comparison of binary code similarity search schemes based on features semantics
名称 | 特点 | 优点 | 缺点 |
---|---|---|---|
嵌入比较 | 从训练数据中自动学习并产生特征向量, 可分为图嵌入和词嵌入 | 使比较对象具有丰富语义且 能实现特征自动抽取,计算效率高 | 嵌入学习到的代码片段特征 不具有解释性 |
机器学习比较 | 将相似性比较问题看作是分类或预测任务 | 代码越相似,分类预测越准确 | 无法识别进化,可解释性不足 |
分类 | 方案 | 优势 | 不足 |
---|---|---|---|
基于语法的代码搜索 | 关注代码字面的含义,主要包括基于哈希、 指令序列和CFG的代码搜索技术 | 能解决代码相同问题 | 无法处理代码混淆,跨平台鲁棒性搜索弱,不具有高级语义信息,可解释性差 |
基于语义的代码搜索 | 关注代码行为对寄存器或内存偏移量的影响, 主要包括基本块语义比较技术和 嵌入语义比较技术 | 相较于操作码或操作数等低级指令语义,方案具有基本块级别的语义 | 基本块级别的语义理解受条件约束且代价高,嵌入生成的语义可解释性差 |
基于语用的代码搜索 | 关注函数和整个程序的语义,主要包括 调试信息恢复和函数语义识别技术 | 给出高级自然语言语义, 可解释性更强 | 准确性偏低 |
Tab. 7 Comparison of binary code similarity search schemes
分类 | 方案 | 优势 | 不足 |
---|---|---|---|
基于语法的代码搜索 | 关注代码字面的含义,主要包括基于哈希、 指令序列和CFG的代码搜索技术 | 能解决代码相同问题 | 无法处理代码混淆,跨平台鲁棒性搜索弱,不具有高级语义信息,可解释性差 |
基于语义的代码搜索 | 关注代码行为对寄存器或内存偏移量的影响, 主要包括基本块语义比较技术和 嵌入语义比较技术 | 相较于操作码或操作数等低级指令语义,方案具有基本块级别的语义 | 基本块级别的语义理解受条件约束且代价高,嵌入生成的语义可解释性差 |
基于语用的代码搜索 | 关注函数和整个程序的语义,主要包括 调试信息恢复和函数语义识别技术 | 给出高级自然语言语义, 可解释性更强 | 准确性偏低 |
1 | FOSSA. The 2021 state of open source vulnerabilities [EB/OL]. [2021-08-08] . 10.4236/ojmn.2018.84030 |
2 | DAVID Y, PARTUSH N, YAHAV E. FirmUp: precise static detection of common vulnerabilities in firmware[C]// Proceedings of the 23nd International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2018: 392-404. 10.1145/3173162.3177157 |
3 | SHIRANI P, COLLARD L, AGBA B L, et al. BINARM: scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices[C]// Proceedings of the 2018 International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, LNCS 10885. Berlin: Springer, 2018: 114-138. |
4 | DAVID Y, YAHAV E. Tracelet-based code search in executables [J]. ACM SIGPLAN Notices, 2014, 49(6): 349-360. 10.1145/2666356.2594343 |
5 | PEWNY J, SCHUSTER F, BERNHARD L, et al. Leveraging semantic signatures for bug search in binary programs[C]// Proceedings of the 30th Annual Computer Security Applications Conference. New York: ACM, 2014: 406-415. 10.1145/2664243.2664269 |
6 | PEWNY J, GARMANY B, GAWLIK R, et al. Cross-architecture bug search in binary executables[C]// Proceedings of the 2015 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2015: 709-724. 10.1109/sp.2015.49 |
7 | SEBASTIAN E, KHALED Y, GERHARDS-PADILLA E, et al. discovRE: efficient cross-architecture identification of bugs in binary code[C]// Proceedings of the 2016 International Conference on Network and Distributed System Security Symposium. San Diego: NDSS, 2016: 49-64. 10.14722/ndss.2016.23185 |
8 | DAVID Y, PARTUSH N, YAHAV E. Statistical similarity of binaries[C]// Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2016: 266-280. 10.1145/2908080.2908126 |
9 | FENG Q, ZHOU R D, XU C C, et al. Scalable graph-based bug search for firmware images[C]// Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2016: 480-491. 10.1145/2976749.2978370 |
10 | CHANDRAMOHAN M, XUE Y X, Xu Z Z, et al. BinGo: cross-architecture cross-OS binary search[C]// Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2016: 678-689. 10.1145/2950290.2950350 |
11 | HUANG H, YOUSSEF A M, DEBBABI M. BinSequence: fast, accurate and scalable binary code reuse detection[C]// Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. New York: ACM, 2017: 155-166. 10.1145/3052973.3052974 |
12 | FENG Q, WANG M H, ZHANG M. Extracting conditional formulas for cross-platform bug search[C]// Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. New York: ACM, 2017: 346-359. 10.1145/3052973.3052995 |
13 | DAVID Y, PARTUSH N, YAHAV E. Similarity of binaries through re-optimization[C] // Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2017: 79-94. 10.1145/3062341.3062387 |
14 | GAO J, YANG X, FU Y, et al. VulSeeker: a semantic learning based vulnerability seeker for cross-platform binary[C]// Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2018: 896-899. 10.1145/3238147.3240480 |
15 | XU X J, LIU C, FENG Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 363-376. 10.1145/3133956.3134018 |
16 | LIU B C, HUO W, ZHANG C, et al. αDiff: cross-version binary code similarity detection with DNN[C]// Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2018:667-678. 10.1145/3238147.3238199 |
17 | ABUBAKR S, BAHARUDIN B, JUNG L T, et al. Detecting malicious executable file via graph comparison using support vector machine[C]// Proceedings of the 2012 International Conference on Computer & Information Science. Piscataway: IEEE, 2012: 469-473. 10.1109/iccisci.2012.6297291 |
18 | WANG T Y, WU C H. Detection of packed executables using support vector machines[C]// Proceedings of the 2011 International Conference on Machine Learning and Cybernetics. Piscataway: IEEE, 2011: 717-722. 10.1109/icmlc.2011.6016774 |
19 | HU X, CHIUEH T C, SHIN K G. Large-scale malware indexing using function-call graphs[C]// Proceedings of the 16th ACM Conference on Computer and Communications Security. New York: ACM, 2009: 611-620. 10.1145/1653662.1653736 |
20 | HU X, BHATKAR S, GRIFFIN K, et al. MutantX-S: scalable malware clustering based on static features[C]// Proceedings of the 2013 International Conference on USENIX Annual Technical Conference. San Jose, CA: USENIX Association, 2013: 187-198. |
21 | KIM T, LEE Y R, KANG B, et al. Binary executable file similarity calculation using function matching[J]. The Journal of Supercomputing, 2019, 75(2):607-622. 10.1007/s11227-016-1941-2 |
22 | KRUEGEL C, KIRDA E, MUTZ D, et al. Polymorphic worm detection using structural information of executables[C]// Proceedings of the 2005 International Conference on Recent Advances in Intrusion Detection. Cham: Springer, 2005: 207-226. 10.1007/11663812_11 |
23 | BRUSCHI D, MARTIGNONI L, MONGA M. Detecting self-mutating malware using control-flow graph matching[C]// Proceedings of the 2006 International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Cham: Springer, 2006: 129-143. 10.1007/11790754_8 |
24 | CESARE S, XIANG Y, ZHOU W L. Control flow-based malware variant detection[J]. IEEE Transactions on Dependable and Secure Computing, 2014, 11(4):307-317. 10.1109/tdsc.2013.40 |
25 | LINDORFER M, FEDERICO A D, MAGGI F, et al. Lines of malicious code:insights into the malicious software industry[C] // Proceedings of the 28th Annual Computer Security Applications Conference. New York: ACM, 2012: 349-358. |
26 | JANG J Y, WOO M, BRUMLEY D. Towards automatic software lineage inference[C] // Proceedings of the 22nd USENIX Conference on Security. New York: ACM, 2013: 81-96. |
27 | MING J, XU D P, WU D H. Memoized semantics-based binary diffing with application to malware lineage inference[C] // Proceedings of the 2015 International Conference on ICT Systems Security and Privacy Protection. Cham: Springer, 2015: 416-430. 10.1007/978-3-319-18467-8_28 |
28 | ELHADI A A E, MAAROF M A, BARRY B. Improving the detection of malware behaviour using simplified data dependent API call graph[J]. International Journal of Security and Its Applications, 2013, 7(5):29-42. 10.14257/ijsia.2013.7.5.03 |
29 | BAKER B S, MANBER U, MUTH R. Compressing differences of executable code[C]// Proceedings of the 1999 International Conference on Compiler Support for System Software. New York: ACM, 1999: 1-10. |
30 | DULLIEN T. Structural comparison of executable objects[C] // Proceedings of the 2004 International Conference on Detection of Intrusions and Malware & Vulnerability Assessment. Dortmund, Germany: DIMVA, 2004: 161-174. |
31 | DULLIEN T, ROLLES R. Graph-based comparison of executable objects[C] // Proceedings of the 2005 International Conference on Symposium Sur La Securite Des Technologies De L’Information Et Des Communications. Cesson Sévigné: Association STIC, 2005:1-13. |
32 | GAO D B, REITER M K, SONG D. BinHunt: automatically finding semantic differences in binary programs[C] // Proceedings of the 2008 International Conference on Information and Communications Security. Cham: Sprinter, 2008: 238-255. 10.1007/978-3-540-88625-9_16 |
33 | HU Y K, ZHANG Y Y, LI J R, et al. Cross-architecture binary semantics understanding via similar code comparison[C] // Proceedings of the 23nd International Conference on Software Analysis, Evolution, and Reengineering. Piscataway: IEEE 2016: 57-67. 10.1109/saner.2016.50 |
34 | XU Z Z, CHEN B H, CHANDRAMOHAN M, et al. SPAIN: security patch analysis for binaries towards understanding the pain and pills[C] // Proceedings of the 39th International Conference on Software Engineering. Piscataway: IEEE, 2017: 462-472. 10.1109/icse.2017.49 |
35 | KARGÉN U, SHAHMEHRI N. Towards robust instruction-level trace alignment of binary code[C]// Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. Piscataway: IEEE, 2017: 342-352. 10.1109/ase.2017.8115647 |
36 | LUO L N, MING J, WU D H, et al. Semantics-based obfuscation resilient binary code similarity comparison with applications to software and algorithm plagiarism detection[J]. IEEE Transactions on Software Engineering, 2017, 43(12): 1157-1177. 10.1109/tse.2017.2655046 |
37 | TIAN Z Z, ZHENG Q H, LIU T, et al. Software plagiarism detection with birthmarks based on dynamic key instruction sequences[J]. IEEE Transactions on Software Engineering, 2015, 41(12): 1217-1235. 10.1109/tse.2015.2454508 |
38 | HU Y K, ZHANG Y Y, LI J R, et al. BinMatch: a semantics- based hybrid approach on binary code clone analysis[C]// Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution. Piscataway: IEEE, 2018: 104-114. 10.1109/icsme.2018.00019 |
39 | ZHANG F F, JHI Y C, WU D H, et al. A first step towards algorithm plagiarism detection[C]// Proceedings of the 2012 International Symposium on Software Testing and Analysis. New York: ACM, 2012: 111-121. 10.1145/2338965.2336767 |
40 | KHOO W M, MYCROFT A, ANDERSON R. Rendezvous: a search engine for binary code[C]// Proceedings of the 2013 10th IEEE Working Conference on Mining Software Repositories. Piscataway: IEEE, 2013: 329-338. 10.1109/msr.2013.6624046 |
41 | Hex-Rays. State-of-the-art binary code analysis tools [EB/OL]. [2021-07-08]. . 10.1039/b510835g |
42 | KEIVANLOO I, ROY C K, RILLING J. SeByte: scalable clone and similarity search for bytecode[J]. Science of Computer Programming, 2014, 95(4):426-444. 10.1016/j.scico.2013.10.006 |
43 | CHEN K, LIU P, ZHANG Y J. Achieving accuracy and scalability simultaneously in detecting application clones on Android markets[C]// Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014: 175-186. 10.1145/2568225.2568286 |
44 | MYLES G, COLLBERG C. K-gram software birthmarks[C]// Proceedings of the 20th International Conference on Applied Computing. New York: ACM,2005:314-318. 10.1145/1066677.1066753 |
45 | HAQ I U, CABALLERO J. A survey of binary code similarity [J].ACM Computing Surveys, 54(3): 51.1-51.38. |
46 | 梁光辉, 庞建民, 单征. 基于代码进化的恶意代码沙箱规避检测技术研究[J]. 电子与信息学报, 2019, 41(2): 341-347. 10.11999/JEIT180257 |
LIANG G H, PANG J M, SHAN Z. Malware sandbox evasion detection based on code evolution[J]. Journal of Electronics & Information Technology, 2019, 41(2): 341-347. 10.11999/JEIT180257 | |
47 | LAGEMAN N, KILMER E D, WALLS R J, et al. BinDNN: resilient function matching using deep learning[C]// Proceedings of the 2016 International Conference on Security and Privacy in Communication Networks. Cham: Sprinter, 2016: 517-537. 10.1007/978-3-319-59608-2_29 |
48 | ZUO F, LI X P, YOUNG P, et al. Neural machine translation inspired binary code similarity comparison beyond function pairs[EB/OL].[2018-12-16]. . 10.14722/ndss.2019.23492 |
49 | MASSARELLI L, LUNA G A D, PETRONI F, et al. SAFE: self-attentive function embeddings for binary similarity[C]// Proceedings of the 16th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Cham: Sprinter,2019: 309-329. 10.1007/978-3-030-22038-9_15 |
50 | 陈昱,刘中金,赵威威,等. 一种大规模的跨平台同源二进制文件检索方法[J]. 计算机研究与发展, 2018, 55(7): 1498-1507. 10.7544/issn1000-1239.2018.20180078 |
CHEN Y, LIU Z J, ZHAO W W, et al. A large-scale cross-platform homologous binary retrieval method[J]. Journal of Computer Research and Development, 2018, 55(7): 1498-1507. 10.7544/issn1000-1239.2018.20180078 | |
51 | 乔延臣,云晓春,庹宇鹏, 等. 基于simhash与倒排索引的复用代码快速溯源方法 [J] .通信学报,2016,37(11),104-113. 10.11959/j.issn.1000-436x.2016225 |
QIAO Y C, YUN X C, TUO Y P, et al. Fast reused code tracing method based on simhash and inverted index[J]. Journal on Communications, 2016, 37(11):104-113. 10.11959/j.issn.1000-436x.2016225 | |
52 | JIN W, CHAKI S COHEN C, et al. Binary function clustering using semantic hashes[C]// Proceedings of the 2012 11th International Conference on Machine Learning and Applications. Piscataway: IEEE, 2012: 386-391. 10.1109/icmla.2012.70 |
53 | DING S H H, FUNG B C M, CHARLAND P. Kam1n0: MapReduce-based assembly clone search for reverse engineering[C]// Proceedings of the 22th International Conference on Knowledge Discovery and Data Mining. Cham: Sprinter, 2016: 461-470. 10.1145/2939672.2939719 |
54 | KORNBLUM J. Identifying almost identical files using context triggered piecewise hashing[J]. Digital Investigation, 2006, 3:91-97. 10.1016/j.diin.2006.06.015 |
55 | PAGANI F, DELL’AMICO M, BALZAROTTI D. Beyond precision and recall: understanding uses (and misuses) of similarity hashes in binary analysis[C]//Proceedings of the 8th International Conference on Data and Application Security and Privacy. New York: ACM, 2018: 354-365. 10.1145/3176258.3176306 |
56 | AZAB A, LAYTON R, ALAZAB M, et al. Mining malware to detect variants[C]// Proceedings of the 2014 5th International Conference on Cybercrime and Trustworthy Computing. Piscataway: IEEE, 2014: 44-53. 10.1109/ctc.2014.11 |
57 | LI Y P, SUNDARAMURTHY S C, BARDAS A G, et al. Experimental study of fuzzy hashing in malware clustering analysis[C] // Proceedings of the 8th USENIX Conference on Cyber Security Experimentation and Test. New York: ACM, 2015:8. |
58 | JANG J, BRUMLEY D, VENKATARAMAN S. BitShred: feature hashing malware for scalable triage and semantic analysis[C]// Proceedings of the 18th International Conference on Computer and Communications Security. New York: ACM, 2011: 309-320. 10.1145/2046707.2046742 |
59 | WICHERSKI G. peHash: a novel approach to fast malware clustering [C]// Proceedings of the 2nd USENIX Conference on Large-scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More. Berkeley: USENIX Association, 2019: 1-8. |
60 | FIREEYE. Tracking malware with import hashing [EB/OL]. [2021-08-08]. . 10.1109/fuzz48607.2020.9177636 |
61 | FARHADI M R, FUNG B C M, CHARLAND P, et al. BinClone: detecting code clones in malware[C]// Proceedings of the 2014 8th International Conference on Software Security and Reliability. Piscataway: IEEE, 2014: 78-87. 10.1109/sere.2014.21 |
62 | LEE Y R, KANG B, IM E G. Function matching-based binary-level software similarity calculation[C]// Proceedings of the 2013 International Conference on Research in Adaptive and Convergent Systems. New York: ACM, 2013: 322-327. 10.1145/2513228.2513300 |
63 | LAKHOTIA A, PREDA M D, GIACOBAZZI R. Fast location of similar code fragments using semantic juice[C]// Proceedings of the 2nd International Conference on ACM SIGPLAN Program Protection & Reverse Engineering Workshop. New York: ACM, 2013: 5.1-5.6. 10.1145/2430553.2430558 |
64 | ANDRIESSE D, SLOWINSKA A, BOS H. Compiler-agnostic function detection in binaries[C]// Proceedings of the 2017 International Conference on IEEE European Symposium on Security and Privacy. Piscataway: IEEE, 2017: 177-189. 10.1109/eurosp.2017.11 |
65 | ALRABAEE S, SHIRANI P, WANG L Y, et al. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code[J]. Digital Investigation, 2015,12(1): S61-S71. 10.1016/j.diin.2015.01.011 |
66 | QIU J, SU X H, MA P J. Library functions identification in binary code by using graph isomorphism testings[C]// Proceedings of the 2015 IEEE International Conference on Software Analysis, Evolution and Reengineering. Piscataway: IEEE, 2015: 261-270. 10.1109/saner.2015.7081836 |
67 | ZHANG X C, PANG J M, LIU X N. Common program similarity metric method for anti-obfuscation[J]. IEEE Access, 2018, 6: 47557-47565. 10.1109/access.2018.2867531 |
68 | MING J, PAN M, GAO D B. iBinHunt: binary hunting with inter-procedural control flow[C]// Proceedings of the 15th Annual International Conference on Information Security and Cryptology. Cham: Springer, 2012: 92-109. 10.1007/978-3-642-37682-5_8 |
69 | BOURQUIN M, KING A, ROBBINS E. BinSlayer: accurate comparison of binary executables[C]// Proceedings of the 2nd International Conference on Program Protection and Reverse Engineering Workshop. New York: ACM, 2013: 1-10. 10.1145/2430553.2430557 |
70 | YU Z P, CAO R, TANG Q Y, et al. Order matters: semantic-aware neural networks for binary code similarity detection[C]// Proceedings of the 2020 International Conference on AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020:1145-1152. 10.1609/aaai.v34i01.5466 |
71 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 International Conference on North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, NAACL-HLT, 2019: 4171-4186. 10.18653/v1/n19-1423 |
72 | HU Y K, ZHANG Y Y, LI J R, et al. Binary code clone detection across architectures and compiling configurations[C]// Proceedings of the 2017 IEEE/ACM International Conference on Program Comprehension. Piscataway: IEEE, 2017: 88-98. 10.1109/icpc.2017.22 |
73 | WANG S, WU D H. In-memory fuzzing for binary code similarity analysis[C]// Proceedings of the 2017 32nd IEEE/ACM International Conference on Automated Software Engineering. Piscataway: IEEE, 2017: 319-330. 10.1109/ase.2017.8115645 |
74 | NG B H, PRAKASH A. Expose: discovering potential binary code re-use[C]// Proceedings of the 2013 IEEE International Conference on Computer Software and Applications Conference. Piscataway: IEEE, 2013: 492-501. 10.1109/compsac.2013.83 |
75 | MING J, XU D P, JIANG Y F, et al. BinSim: trace-based semantic binary diffing via system call sliced segment equivalence checking[C]// Proceedings of the 26th International Conference on USENIX Security Symposium. Berkeley: USENIX Association,2017: 253-270. |
76 | MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space[EB/OL]. [2013-09-07]. . 10.3126/jiee.v3i1.34327 |
77 | LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]// Proceedings of the 31st International Conference on Machine Learning. New York: JMLR.org, 2014: II-1188-II-1196. |
78 | REDMOND K, LUO L, ZENG Q. A Cross-architecture instruction embedding model for natural language processing-inspired binary code analysis[EB/OL]. [2018-12-23]. . 10.14722/bar.2019.23057 |
79 | DING S H H, FUNG B C M, CHARLAND P. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization[C]// Proceedings of the 2019 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2019: 472-489. 10.1109/sp.2019.00003 |
80 | SHALEV N, PARTUSH N. Binary similarity detection using machine learning[C]// Proceedings of the 13th Workshop on Programming Languages and Analysis for Security. New York: ACM, 2018: 42-47. 10.1145/3264820.3264821 |
81 | 常青,刘中金,王猛涛,等. VDNS: 一种跨平台的固件漏洞关联算法[J]. 计算机研究与发展, 2016, 53(10): 2288-2298. 10.7544/issn1000-1239.2016.20160442 |
CHANG Q, LIU Z J, WANG M T, et al. VDNS: an algorithm for cross-platform vulnerability searching in binary firmware[J]. Journal of Computer Research and Development, 2016, 53(10): 2288-2298. 10.7544/issn1000-1239.2016.20160442 | |
82 | ZHANG X C, SUN W J, PANG J M, et al. Similarity metric method for binary basic blocks of cross-instruction set architecture[C]// Proceedings of the 2020 International Conference on Binary Analysis Research. San Diego: ISOC Press, 2020: 23-26. 10.14722/bar.2020.23002 |
83 | HE J X, IVANOV P, TSANKOW P, et al. Debin: Predicting debug information in stripped binaries[C]// Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2018: 1667-1680. 10.1145/3243734.3243866 |
84 | LACOMIS J, YIN P C, SCHWARTZ E J, et al. DIRE: A neural approach to decompiled identifier naming[C]// Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering. New York: ACM, 2019: 628-639. 10.1109/ase.2019.00064 |
85 | DAVID Y, ALON U, YAHAV E. Neural reverse engineering of stripped binaries using augmented control flow graphs[C]// Proceedings of the 2020 ACM International Conference on Object-oriented Programming Systems, Languages, and Applications. New York: ACM, 2020: 1-28. 10.1145/3428293 |
86 | REDINI N, MACHIRY A, WANG R Y, et al. Karonte: detecting insecure multi-binary interactions in embedded firmware[C]// Proceedings of the 2020 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2020: 1544-1561. 10.1109/sp40000.2020.00036 |
87 | 刘知远,孙茂松,林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53(2): 247-261. 10.7544/issn1000-1239.2016.20160020 |
LIU Z Y, SUN M S, LIN Y K, et al. Knowledge representation learning: a review[J]. Journal of Computer Research and Development, 2016, 53(2): 247-261. 10.7544/issn1000-1239.2016.20160020 | |
88 | 官赛萍,靳小龙,贾岩涛,等.面向知识图谱的知识推理研究进展[J]. 软件学报, 2018, 29(10): 2966-2994. 10.13328/j.cnki.jos.005551 |
GUAN S P, JIN X L, JIA Y T,et al. Knowledge reasoning over knowledge graph: a survey[J]. Journal of Software, 2018, 29(10): 2966-2994. 10.13328/j.cnki.jos.005551 | |
89 | ZHANG Z Y, HAN X, LIU Z Y, et al. ERNIE: enhanced language representation with informative entities[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2019: 1141-1451. 10.18653/v1/p19-1139 |
90 | BORDES A, USUNIER N, GARCIA-DURAN A, et al. Translating embeddings for modeling multi-relational data[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. New York: ACM, 2013:2787-2795. 10.1007/978-3-662-44848-9_28 |
91 | LIN Y K, LIU Z Y, LUAN H B, et al. Modeling relation paths for representation learning of knowledge bases[C]// Proceedings of the 2015 International Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2015: 705-714. 10.18653/v1/d15-1082 |
92 | GUO W B, MU D L, XU J, et al. LEMNA: explaining deep learning based security applications[C]// Proceedings of the 25th International Conference on Computer and Communications Security. New York: ACM, 2018: 364-379. 10.1145/3243734.3243792 |
93 | HAQ I U, CHICA S, CABALLERO J, et al. Malware lineage in the wild[J]. Computer & Security, 2018,78:347-363. 10.1016/j.cose.2018.07.012 |
94 | CABALLERO J, LIN Z Q. Type inference on executables[J]. ACM Computing Surveys, 2016,48(4):65.1-65.35. 10.1145/2896499 |
95 | XU Z W, WEN C, QIN S C. Learning types for binaries[C]// Proceedings of the 19th International Conference on Formal Engineering Methods and Software Engineering. Cham: Springer, 2017: 430-446. 10.1007/978-3-319-68690-5_26 |
96 | TIAN Y, LAWALL J, LO D. Identifying Linux bug fixing patches[C]// Proceedings of the 2012 34th International Conference on Software Engineering. Piscataway: IEEE,2012: 386-396. 10.1109/icse.2012.6227176 |
97 | SOBREIRA V, DURIEUX T, MADEIRAL F, et al. Dissection of a bug dataset: anatomy of 395 patches from Defects4J[C]// Proceedings of the 25th International Conference on Software Analysis, Evolution and Reengineering. Piscataway: IEEE,2018: 130-140. 10.1109/saner.2018.8330203 |
98 | LIU J, WANG Y, XIE P D, et al. Inferring phylogenetic network of malware families based on splits graph[J]. IEICE Transactions on Information and Systems, 2017, 100(6):1368-1371. 10.1587/transinf.2016edl8230 |
99 | ZHAO B L, SHAN Z, LIU F D, et al. Malware homology identification based on a gene perspective[J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20: 801-815. 10.1631/fitee.1800523 |
100 | 赵炳麟.基于基因视角的恶意代码分析及关键技术研究[D].郑州:中国人民解放军战略支援部队信息工程大学,2019:67-73. 10.1631/fitee.1800523 |
ZHAO B L. Analysis and key technologies of malware based on gene perspective[D]. Zhengzhou: Information Engineering University, 2019:67-73. 10.1631/fitee.1800523 |
[1] | Haixiang HUANG, Shuanghe PENG, Ziyu ZHONG. Binary code identification based on user system call sequences [J]. Journal of Computer Applications, 2024, 44(7): 2160-2167. |
[2] | Xiangjie SUN, Qiang WEI, Yisen WANG, Jiang DU. Survey of code similarity detection technology [J]. Journal of Computer Applications, 2024, 44(4): 1248-1258. |
[3] | Qihong SONG, Jianxun LIU, Haize HU, Xiangping ZHANG. Code search model based on collaborative fusion network [J]. Journal of Computer Applications, 2023, 43(12): 3896-3902. |
[4] | GUO Maozu, ZHANG Bin, ZHAO Lingling, ZHANG Yu. Activity semantic recognition method based on joint features and XGBoost [J]. Journal of Computer Applications, 2020, 40(11): 3159-3165. |
[5] | Liarod Romangol Dawu Gu Haining Lu. A program understanding approach for stripped binary code [J]. Journal of Computer Applications, 2008, 28(10): 2608-2612. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||