Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (5): 1490-1499.DOI: 10.11772/j.issn.1001-9081.2021030486
• Cyber security • Previous Articles Next Articles
Mo LI, Tianliang LU(), Ziheng XIE
Received:
2021-03-31
Revised:
2021-06-23
Accepted:
2021-06-25
Online:
2022-06-11
Published:
2022-05-10
Contact:
Tianliang LU
About author:
LI Mo, born in 1995,M. S. candidate. His research interestsinclude malware detection,machine learning.Supported by:
通讯作者:
芦天亮
作者简介:
李默(1995—),男,江西赣州人,硕士研究生,主要研究方向:恶意代码检测、机器学习基金资助:
CLC Number:
Mo LI, Tianliang LU, Ziheng XIE. Android malware family classification method based on code image integration[J]. Journal of Computer Applications, 2022, 42(5): 1490-1499.
李默, 芦天亮, 谢子恒. 基于代码图像合成的Android恶意软件家族分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1490-1499.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021030486
文件大小 | 图像宽度/像素 |
---|---|
<10 KB | 64 |
[10 KB,30 KB) | 128 |
[30 KB,60 KB) | 214 |
[60 KB,100 KB) | 280 |
[100 KB,200 KB) | 400 |
[200 KB,500 KB) | 600 |
[500 KB,1 MB) | 864 |
[1 MB,2 MB) | 1 280 |
[2 MB,3 MB) | 1 600 |
[3 MB,4 MB] | 1 920 |
>4 MB | 2 048 |
Tab. 1 Ratio of code image conversation
文件大小 | 图像宽度/像素 |
---|---|
<10 KB | 64 |
[10 KB,30 KB) | 128 |
[30 KB,60 KB) | 214 |
[60 KB,100 KB) | 280 |
[100 KB,200 KB) | 400 |
[200 KB,500 KB) | 600 |
[500 KB,1 MB) | 864 |
[1 MB,2 MB) | 1 280 |
[2 MB,3 MB) | 1 600 |
[3 MB,4 MB] | 1 920 |
>4 MB | 2 048 |
变换方法 | 参数值 |
---|---|
中心旋转 | [-0.1,0.1] |
水平平移 | [-0.1,0.1] |
垂直平移 | [-0.1,0.1] |
中心放大 | [-0.2,0.2] |
翻转模式 | 水平翻转 |
填充模式 | 固定224插值填充 |
Tab. 2 Parameter values of data augmentation
变换方法 | 参数值 |
---|---|
中心旋转 | [-0.1,0.1] |
水平平移 | [-0.1,0.1] |
垂直平移 | [-0.1,0.1] |
中心放大 | [-0.2,0.2] |
翻转模式 | 水平翻转 |
填充模式 | 固定224插值填充 |
家族 | 总样本数 | 训练集样本数 | 测试集样本数 | ||
---|---|---|---|---|---|
原始 | 扩增后 | 原始 | 扩增后 | ||
FakeInstaller | 925 | 1 185 | 740 | 1 000 | 185 |
DroidKungFu | 666 | 1 133 | 533 | 1 000 | 133 |
Plankton | 625 | 1 125 | 500 | 1 000 | 125 |
Opfake | 613 | 1 123 | 490 | 1 000 | 123 |
GinMaster | 339 | 1 068 | 271 | 1 000 | 68 |
BaseBridge | 329 | 1 066 | 263 | 1 000 | 66 |
Iconosys | 152 | 1 030 | 122 | 1 000 | 30 |
Kmin | 147 | 1 029 | 118 | 1 000 | 29 |
FakeDoc | 132 | 1 027 | 105 | 1 000 | 27 |
Geinimi | 92 | 1 019 | 73 | 1 000 | 19 |
Tab. 3 Experimental dataset
家族 | 总样本数 | 训练集样本数 | 测试集样本数 | ||
---|---|---|---|---|---|
原始 | 扩增后 | 原始 | 扩增后 | ||
FakeInstaller | 925 | 1 185 | 740 | 1 000 | 185 |
DroidKungFu | 666 | 1 133 | 533 | 1 000 | 133 |
Plankton | 625 | 1 125 | 500 | 1 000 | 125 |
Opfake | 613 | 1 123 | 490 | 1 000 | 123 |
GinMaster | 339 | 1 068 | 271 | 1 000 | 68 |
BaseBridge | 329 | 1 066 | 263 | 1 000 | 66 |
Iconosys | 152 | 1 030 | 122 | 1 000 | 30 |
Kmin | 147 | 1 029 | 118 | 1 000 | 29 |
FakeDoc | 132 | 1 027 | 105 | 1 000 | 27 |
Geinimi | 92 | 1 019 | 73 | 1 000 | 19 |
插值算法 | 尺寸 | F1/% | 准确率/% | 耗时/s |
---|---|---|---|---|
Nearest | 97.76 | 98.23 | 1.64 | |
Box | 98.10 | 98.54 | 7.53 | |
Lanczos | 98.21 | 98.63 | 18.47 | |
Bicubic | 98.80 | 98.94 | 10.03 | |
Bilinear | 98.81 | 98.97 | 5.03 | |
Bicubic (EfficientNet) | 95.95 | 96.48 | 10.03 | |
Bilinear (EfficientNet) | 94.68 | 95.35 | 5.03 |
Tab. 4 Performance comparison of different interpolation algorithms
插值算法 | 尺寸 | F1/% | 准确率/% | 耗时/s |
---|---|---|---|---|
Nearest | 97.76 | 98.23 | 1.64 | |
Box | 98.10 | 98.54 | 7.53 | |
Lanczos | 98.21 | 98.63 | 18.47 | |
Bicubic | 98.80 | 98.94 | 10.03 | |
Bilinear | 98.81 | 98.97 | 5.03 | |
Bicubic (EfficientNet) | 95.95 | 96.48 | 10.03 | |
Bilinear (EfficientNet) | 94.68 | 95.35 | 5.03 |
样本均衡方法 | 精确率/% | 召回率/% | F1/% |
---|---|---|---|
原数据 | 97.82 | 98.48 | 98.15 |
数据增强 | 98.43 | 97.91 | 98.17 |
CB Loss | 98.11 | 98.57 | 98.34 |
数据增强+CB Loss | 98.87 | 98.75 | 98.81 |
Tab. 5 Performance comparison of different sample balancing methods
样本均衡方法 | 精确率/% | 召回率/% | F1/% |
---|---|---|---|
原数据 | 97.82 | 98.48 | 98.15 |
数据增强 | 98.43 | 97.91 | 98.17 |
CB Loss | 98.11 | 98.57 | 98.34 |
数据增强+CB Loss | 98.87 | 98.75 | 98.81 |
基础网络 | 网络层数 | 精确率/% | 召回率/% | F1/% | 准确率/% |
---|---|---|---|---|---|
ResNet | 50 | 97.38 | 96.81 | 97.09 | 97.87 |
101 | 96.88 | 97.14 | 97.01 | 97.80 | |
ResNeXt | 50 | 97.65 | 97.71 | 97.68 | 97.98 |
101 | 97.64 | 97.40 | 97.52 | 98.19 | |
SENet | 50 | 97.83 | 97.81 | 97.82 | 98.18 |
101 | 97.96 | 97.94 | 97.95 | 98.27 | |
SKNet | 50 | 98.50 | 97.76 | 98.13 | 98.39 |
101 | 97.54 | 98.22 | 97.88 | 98.31 | |
ResNeSt | 50 | 98.14 | 98.66 | 98.40 | 98.61 |
101 | 98.41 | 98.33 | 98.37 | 98.57 | |
STResNeSt | 50 | 98.87 | 98.75 | 98.81 | 98.97 |
101 | 98.81 | 98.69 | 98.75 | 98.95 |
Tab. 6 Performance comparison of different residual networks
基础网络 | 网络层数 | 精确率/% | 召回率/% | F1/% | 准确率/% |
---|---|---|---|---|---|
ResNet | 50 | 97.38 | 96.81 | 97.09 | 97.87 |
101 | 96.88 | 97.14 | 97.01 | 97.80 | |
ResNeXt | 50 | 97.65 | 97.71 | 97.68 | 97.98 |
101 | 97.64 | 97.40 | 97.52 | 98.19 | |
SENet | 50 | 97.83 | 97.81 | 97.82 | 98.18 |
101 | 97.96 | 97.94 | 97.95 | 98.27 | |
SKNet | 50 | 98.50 | 97.76 | 98.13 | 98.39 |
101 | 97.54 | 98.22 | 97.88 | 98.31 | |
ResNeSt | 50 | 98.14 | 98.66 | 98.40 | 98.61 |
101 | 98.41 | 98.33 | 98.37 | 98.57 | |
STResNeSt | 50 | 98.87 | 98.75 | 98.81 | 98.97 |
101 | 98.81 | 98.69 | 98.75 | 98.95 |
代码图像生成方法 | 精确率/% | 召回率/% | F1/% | 准确率/% |
---|---|---|---|---|
JAR | 94.59 | 94.43 | 94.51 | 94.97 |
JAR(字符筛选) | 95.73 | 95.29 | 95.51 | 96.01 |
XML | 95.96 | 96.38 | 96.17 | 96.92 |
DEX(灰度图) | 95.53 | 95.83 | 95.68 | 96.04 |
DEX | 96.79 | 96.69 | 96.74 | 97.09 |
合成图像 | 98.87 | 98.75 | 98.81 | 98.97 |
Tab.7 Performance comparison of different code image generation methods
代码图像生成方法 | 精确率/% | 召回率/% | F1/% | 准确率/% |
---|---|---|---|---|
JAR | 94.59 | 94.43 | 94.51 | 94.97 |
JAR(字符筛选) | 95.73 | 95.29 | 95.51 | 96.01 |
XML | 95.96 | 96.38 | 96.17 | 96.92 |
DEX(灰度图) | 95.53 | 95.83 | 95.68 | 96.04 |
DEX | 96.79 | 96.69 | 96.74 | 97.09 |
合成图像 | 98.87 | 98.75 | 98.81 | 98.97 |
1 | LI M B, WANG W, WANG P, et al. LibD: scalable and precise third-party library detection in Android markets [C]// Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering. Piscataway: IEEE, 2017: 335-346. 10.1109/icse.2017.38 |
2 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
3 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 |
4 | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5987-5995. 10.1109/cvpr.2017.634 |
5 | LI X, WANG W H, HU X L, et al. Selective kernel networks [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 510-519. 10.1109/cvpr.2019.00060 |
6 | ZHANG H, WU C R, ZHANG Z Y, et al. ResNeSt: split-attention networks [EB/OL]. [2021-03-02]. . 10.1155/2021/7544355 |
7 | SANTOS I, BREZO F, NIEVES J, et al. Idea: opcode-sequence-based malware detection [C]// Proceedings of the 2010 International Symposium on Engineering Secure Software and Systems, LNCS 5965. Berlin: Springer, 2010: 35-43. |
8 | WANG W, WANG X, FENG D W, et al. Exploring permission-induced risk in Android applications for malicious application detection [J]. IEEE Transactions on Information Forensics and Security, 2014, 9(11): 1869-1882. 10.1109/tifs.2014.2353996 |
9 | GRINI L S, SHALAGINOV A, FRANKE K. Study of soft computing methods for large-scale multinomial malware types and families detection [M]// ZADEH L A, YAGER R R, SHAHBAZOVA S N, et al. Recent Developments and the New Direction in Soft-Computing Foundations and Applications, STUDFUZZ 361. Cham: Springer, 2018: 337-350. |
10 | QIU J Y, ZHANG J, LUO W, et al. A3CM: automatic capability annotation for Android malware [J]. IEEE Access, 2019, 7: 147156-147168. 10.1109/access.2019.2946392 |
11 | 张晨斌,张云春,郑杨,等.基于灰度图纹理指纹的恶意软件分类[J].计算机科学,2018,45(6A):383-386. 10.11896/j.issn.1002-137X.2018.Z6.083 |
ZHANG C B, ZHANG Y C, ZHENG Y, et al. Malware classification based on texture fingerprint of gray-scale images [J]. Computer Science, 2018, 45(6A): 383-386. 10.11896/j.issn.1002-137X.2018.Z6.083 | |
12 | HUANG T T H D, KAO H Y. R2-D2: color-inspired Convolutional Neural Network (CNN)-based Android malware detections [C]// Proceedings of the 2018 IEEE International Conference on Big Data. Piscataway: IEEE, 2018: 2633-2642. |
13 | VASAN D, ALAZAB M, WASSAN S, et al. IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture [J]. Computer Networks, 2020, 171: Article No.107138. 10.1016/j.comnet.2020.107138 |
14 | 高杨晨,方勇,刘亮,等.基于卷积神经网络的Android恶意软件检测技术研究[J].四川大学学报(自然科学版),2020,57(4):673-680. 10.3969/j.issn.0490-6756.2020.04.009 |
GAO Y C, FANG Y, LIU L, et al. Android malware detection technology based on deep convolutional neural network [J]. Journal of Sichuan University (Natural Science Edition), 2020, 57(4): 673-680. 10.3969/j.issn.0490-6756.2020.04.009 | |
15 | ZHAO M H, ZHONG S S, FU X Y, et al. Deep residual shrinkage networks for fault diagnosis [J]. IEEE Transactions on Industrial Informatics, 2020, 16(7): 4681-4690. 10.1109/tii.2019.2943898 |
16 | CUI Y, JIA M L, LIN T Y, et al. Class-balanced loss based on effective number of samples [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9260-9269. 10.1109/cvpr.2019.00949 |
17 | NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware images: visualization and automatic classification [C]// Proceedings of the 2011 8th International Symposium on Visualization for Cyber Security. New York; ACM, 2011: Article No.4. 10.1145/2016904.2016908 |
18 | CUI Z H, XUE F, CAI X J, et al. Detection of malicious code variants based on deep learning [J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3187-3196. 10.1109/tii.2018.2822680 |
19 | 孙博文,张鹏,成茗宇,等.基于代码图像增强的恶意代码检测方法[J].清华大学学报(自然科学版),2020,60(5):386-392. 10.16511/j.cnki.qhdxxb.2020.25.008 |
SUN B W, ZHANG P, CHENG M Y, et al. Malware detection method based on enhanced code images [J]. Journal of Tsinghua University (Science and Technology), 2020, 60(5): 386-392. 10.16511/j.cnki.qhdxxb.2020.25.008 | |
20 | ARP D, SPREITZENBARTH M, HÜBNER M, et al. Drebin: efficient and explainable detection of Android malware in your pocket [C]// Proceedings of the 2014 Annual Network and Distributed System Security Symposium. Reston: Internet Society, 2014: 1-12. 10.14722/ndss.2014.23247 |
21 | PRESS W H, TEUKOLSKY S A, VETTERLING W T, et al. Numerical Recipes: the Art of Scientific Computing [M]. 3rd ed. New York: Cambridge University Press, 2007: 123-128. |
22 | KEYS R. Cubic convolution interpolation for digital image processing [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(6): 1153-1160. 10.1109/tassp.1981.1163711 |
23 | TURKOWSKI K. Filters for common resampling tasks [M]// GRASSNER A S. Graphics Gems. Waltham: Academic Press, 1990: 147-165. 10.1016/b978-0-08-050753-8.50042-5 |
24 | LAWRENCE N D, SCHÖLKOPF B. Estimating a kernel fisher discriminant in the presence of label noise [C]// Proceedings of the 2001 18th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2001: 306-313. |
25 | XIA S Y, WANG G Y, CHEN Z Z, et al. Complete random forest based class noise filtering learning for improving the generalizability of classifiers [J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(11): 2063-2078. 10.1109/tkde.2018.2873791 |
26 | WU P X, ZHENG S Z, GOSWAMI M, et al. A topological filter for learning with label noise [EB/OL]. [2021-03-02]. . |
27 | HE H B, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning [C]// Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway: IEEE, 2008: 1322-1328. 10.1109/ijcnn.2008.4633969 |
28 | ZOU Y, YU Z D, VIJAYA KUMAR B V K, et al. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11207. Cham: Springer, 2018: 297-313. |
29 | TAN M X, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks [C]// Proceedings of the 2019 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 6105-6114. |
30 | SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2818-2826. 10.1109/cvpr.2016.308 |
[1] | Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413. |
[2] | Yuan TANG, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Relation extraction model based on multi-scale hybrid attention convolutional neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2011-2017. |
[3] | Feiyu ZHAI, Handa MA. Hybrid classical-quantum classification model based on DenseNet [J]. Journal of Computer Applications, 2024, 44(6): 1905-1910. |
[4] | Hongtian LI, Xinhao SHI, Weiguo PAN, Cheng XU, Bingxin XU, Jiazheng YUAN. Few-shot object detection via fusing multi-scale and attention mechanism [J]. Journal of Computer Applications, 2024, 44(5): 1437-1444. |
[5] | Wangjun SHI, Jing WANG, Xiaojun NING, Youfang LIN. Sleep stage classification model by meta transfer learning in few-shot scenarios [J]. Journal of Computer Applications, 2024, 44(5): 1445-1451. |
[6] | Haoran WANG, Dan YU, Yuli YANG, Yao MA, Yongle CHEN. Domain transfer intrusion detection method for unknown attacks on industrial control systems [J]. Journal of Computer Applications, 2024, 44(4): 1158-1165. |
[7] | Boyue WANG, Yingxiang LI, Jiandan ZHONG. Segmentation network for day and night ground-based cloud images based on improved Res-UNet [J]. Journal of Computer Applications, 2024, 44(4): 1310-1316. |
[8] | Qiujie LIU, Yuan WAN, Jie WU. Deep bi-modal source domain symmetrical transfer learning for cross-modal retrieval [J]. Journal of Computer Applications, 2024, 44(1): 24-31. |
[9] | Mengmeng CHEN, Zhiwei QIAO. Sparse reconstruction of CT images based on Uformer with fused channel attention [J]. Journal of Computer Applications, 2023, 43(9): 2948-2954. |
[10] | Meijia LIANG, Xinwu LIU, Xiaopeng HU. Small target detection algorithm for train operating environment image based on improved YOLOv3 [J]. Journal of Computer Applications, 2023, 43(8): 2611-2618. |
[11] | Kezheng CHEN, Xiaoran GUO, Yong ZHONG, Zhenping LI. Relation extraction method based on negative training and transfer learning [J]. Journal of Computer Applications, 2023, 43(8): 2426-2430. |
[12] | Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389. |
[13] | Bona XUAN, Jin LI, Yafei SONG, Zexuan MA. Malicious code classification method based on improved MobileNetV2 [J]. Journal of Computer Applications, 2023, 43(7): 2217-2225. |
[14] | Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning [J]. Journal of Computer Applications, 2023, 43(6): 1826-1832. |
[15] | Kai ZHANG, Zhengchu QIN, Yue LIU, Xinyi QIN. Multi-learning behavior collaborated knowledge tracing model [J]. Journal of Computer Applications, 2023, 43(5): 1422-1429. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||