Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (5): 1490-1499.DOI: 10.11772/j.issn.1001-9081.2021030486
• Cyber security • Previous Articles Next Articles
					
						                                                                                                                                                                                                                    Mo LI, Tianliang LU( ), Ziheng XIE
), Ziheng XIE
												  
						
						
						
					
				
Received:2021-03-31
															
							
																	Revised:2021-06-23
															
							
																	Accepted:2021-06-25
															
							
							
																	Online:2022-06-11
															
							
																	Published:2022-05-10
															
							
						Contact:
								Tianliang LU   
													About author:LI Mo, born in 1995,M. S. candidate. His research interestsinclude malware detection,machine learning.Supported by:通讯作者:
					芦天亮
							作者简介:李默(1995—),男,江西赣州人,硕士研究生,主要研究方向:恶意代码检测、机器学习基金资助:CLC Number:
Mo LI, Tianliang LU, Ziheng XIE. Android malware family classification method based on code image integration[J]. Journal of Computer Applications, 2022, 42(5): 1490-1499.
李默, 芦天亮, 谢子恒. 基于代码图像合成的Android恶意软件家族分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1490-1499.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021030486
| 文件大小 | 图像宽度/像素 | 
|---|---|
| <10 KB | 64 | 
| [10 KB,30 KB) | 128 | 
| [30 KB,60 KB) | 214 | 
| [60 KB,100 KB) | 280 | 
| [100 KB,200 KB) | 400 | 
| [200 KB,500 KB) | 600 | 
| [500 KB,1 MB) | 864 | 
| [1 MB,2 MB) | 1 280 | 
| [2 MB,3 MB) | 1 600 | 
| [3 MB,4 MB] | 1 920 | 
| >4 MB | 2 048 | 
Tab. 1 Ratio of code image conversation
| 文件大小 | 图像宽度/像素 | 
|---|---|
| <10 KB | 64 | 
| [10 KB,30 KB) | 128 | 
| [30 KB,60 KB) | 214 | 
| [60 KB,100 KB) | 280 | 
| [100 KB,200 KB) | 400 | 
| [200 KB,500 KB) | 600 | 
| [500 KB,1 MB) | 864 | 
| [1 MB,2 MB) | 1 280 | 
| [2 MB,3 MB) | 1 600 | 
| [3 MB,4 MB] | 1 920 | 
| >4 MB | 2 048 | 
| 变换方法 | 参数值 | 
|---|---|
| 中心旋转 | [-0.1,0.1] | 
| 水平平移 | [-0.1,0.1] | 
| 垂直平移 | [-0.1,0.1] | 
| 中心放大 | [-0.2,0.2] | 
| 翻转模式 | 水平翻转 | 
| 填充模式 | 固定224插值填充 | 
Tab. 2 Parameter values of data augmentation
| 变换方法 | 参数值 | 
|---|---|
| 中心旋转 | [-0.1,0.1] | 
| 水平平移 | [-0.1,0.1] | 
| 垂直平移 | [-0.1,0.1] | 
| 中心放大 | [-0.2,0.2] | 
| 翻转模式 | 水平翻转 | 
| 填充模式 | 固定224插值填充 | 
| 家族 | 总样本数 | 训练集样本数 | 测试集样本数 | ||
|---|---|---|---|---|---|
| 原始 | 扩增后 | 原始 | 扩增后 | ||
| FakeInstaller | 925 | 1 185 | 740 | 1 000 | 185 | 
| DroidKungFu | 666 | 1 133 | 533 | 1 000 | 133 | 
| Plankton | 625 | 1 125 | 500 | 1 000 | 125 | 
| Opfake | 613 | 1 123 | 490 | 1 000 | 123 | 
| GinMaster | 339 | 1 068 | 271 | 1 000 | 68 | 
| BaseBridge | 329 | 1 066 | 263 | 1 000 | 66 | 
| Iconosys | 152 | 1 030 | 122 | 1 000 | 30 | 
| Kmin | 147 | 1 029 | 118 | 1 000 | 29 | 
| FakeDoc | 132 | 1 027 | 105 | 1 000 | 27 | 
| Geinimi | 92 | 1 019 | 73 | 1 000 | 19 | 
Tab. 3 Experimental dataset
| 家族 | 总样本数 | 训练集样本数 | 测试集样本数 | ||
|---|---|---|---|---|---|
| 原始 | 扩增后 | 原始 | 扩增后 | ||
| FakeInstaller | 925 | 1 185 | 740 | 1 000 | 185 | 
| DroidKungFu | 666 | 1 133 | 533 | 1 000 | 133 | 
| Plankton | 625 | 1 125 | 500 | 1 000 | 125 | 
| Opfake | 613 | 1 123 | 490 | 1 000 | 123 | 
| GinMaster | 339 | 1 068 | 271 | 1 000 | 68 | 
| BaseBridge | 329 | 1 066 | 263 | 1 000 | 66 | 
| Iconosys | 152 | 1 030 | 122 | 1 000 | 30 | 
| Kmin | 147 | 1 029 | 118 | 1 000 | 29 | 
| FakeDoc | 132 | 1 027 | 105 | 1 000 | 27 | 
| Geinimi | 92 | 1 019 | 73 | 1 000 | 19 | 
| 插值算法 | 尺寸 | F1/% | 准确率/% | 耗时/s | 
|---|---|---|---|---|
| Nearest | 97.76 | 98.23 | 1.64 | |
| Box | 98.10 | 98.54 | 7.53 | |
| Lanczos | 98.21 | 98.63 | 18.47 | |
| Bicubic | 98.80 | 98.94 | 10.03 | |
| Bilinear | 98.81 | 98.97 | 5.03 | |
| Bicubic (EfficientNet) | 95.95 | 96.48 | 10.03 | |
| Bilinear (EfficientNet) | 94.68 | 95.35 | 5.03 | 
Tab. 4 Performance comparison of different interpolation algorithms
| 插值算法 | 尺寸 | F1/% | 准确率/% | 耗时/s | 
|---|---|---|---|---|
| Nearest | 97.76 | 98.23 | 1.64 | |
| Box | 98.10 | 98.54 | 7.53 | |
| Lanczos | 98.21 | 98.63 | 18.47 | |
| Bicubic | 98.80 | 98.94 | 10.03 | |
| Bilinear | 98.81 | 98.97 | 5.03 | |
| Bicubic (EfficientNet) | 95.95 | 96.48 | 10.03 | |
| Bilinear (EfficientNet) | 94.68 | 95.35 | 5.03 | 
| 样本均衡方法 | 精确率/% | 召回率/% | F1/% | 
|---|---|---|---|
| 原数据 | 97.82 | 98.48 | 98.15 | 
| 数据增强 | 98.43 | 97.91 | 98.17 | 
| CB Loss | 98.11 | 98.57 | 98.34 | 
| 数据增强+CB Loss | 98.87 | 98.75 | 98.81 | 
Tab. 5 Performance comparison of different sample balancing methods
| 样本均衡方法 | 精确率/% | 召回率/% | F1/% | 
|---|---|---|---|
| 原数据 | 97.82 | 98.48 | 98.15 | 
| 数据增强 | 98.43 | 97.91 | 98.17 | 
| CB Loss | 98.11 | 98.57 | 98.34 | 
| 数据增强+CB Loss | 98.87 | 98.75 | 98.81 | 
| 基础网络 | 网络层数 | 精确率/% | 召回率/% | F1/% | 准确率/% | 
|---|---|---|---|---|---|
| ResNet | 50 | 97.38 | 96.81 | 97.09 | 97.87 | 
| 101 | 96.88 | 97.14 | 97.01 | 97.80 | |
| ResNeXt | 50 | 97.65 | 97.71 | 97.68 | 97.98 | 
| 101 | 97.64 | 97.40 | 97.52 | 98.19 | |
| SENet | 50 | 97.83 | 97.81 | 97.82 | 98.18 | 
| 101 | 97.96 | 97.94 | 97.95 | 98.27 | |
| SKNet | 50 | 98.50 | 97.76 | 98.13 | 98.39 | 
| 101 | 97.54 | 98.22 | 97.88 | 98.31 | |
| ResNeSt | 50 | 98.14 | 98.66 | 98.40 | 98.61 | 
| 101 | 98.41 | 98.33 | 98.37 | 98.57 | |
| STResNeSt | 50 | 98.87 | 98.75 | 98.81 | 98.97 | 
| 101 | 98.81 | 98.69 | 98.75 | 98.95 | 
Tab. 6 Performance comparison of different residual networks
| 基础网络 | 网络层数 | 精确率/% | 召回率/% | F1/% | 准确率/% | 
|---|---|---|---|---|---|
| ResNet | 50 | 97.38 | 96.81 | 97.09 | 97.87 | 
| 101 | 96.88 | 97.14 | 97.01 | 97.80 | |
| ResNeXt | 50 | 97.65 | 97.71 | 97.68 | 97.98 | 
| 101 | 97.64 | 97.40 | 97.52 | 98.19 | |
| SENet | 50 | 97.83 | 97.81 | 97.82 | 98.18 | 
| 101 | 97.96 | 97.94 | 97.95 | 98.27 | |
| SKNet | 50 | 98.50 | 97.76 | 98.13 | 98.39 | 
| 101 | 97.54 | 98.22 | 97.88 | 98.31 | |
| ResNeSt | 50 | 98.14 | 98.66 | 98.40 | 98.61 | 
| 101 | 98.41 | 98.33 | 98.37 | 98.57 | |
| STResNeSt | 50 | 98.87 | 98.75 | 98.81 | 98.97 | 
| 101 | 98.81 | 98.69 | 98.75 | 98.95 | 
| 代码图像生成方法 | 精确率/% | 召回率/% | F1/% | 准确率/% | 
|---|---|---|---|---|
| JAR | 94.59 | 94.43 | 94.51 | 94.97 | 
| JAR(字符筛选) | 95.73 | 95.29 | 95.51 | 96.01 | 
| XML | 95.96 | 96.38 | 96.17 | 96.92 | 
| DEX(灰度图) | 95.53 | 95.83 | 95.68 | 96.04 | 
| DEX | 96.79 | 96.69 | 96.74 | 97.09 | 
| 合成图像 | 98.87 | 98.75 | 98.81 | 98.97 | 
Tab.7 Performance comparison of different code image generation methods
| 代码图像生成方法 | 精确率/% | 召回率/% | F1/% | 准确率/% | 
|---|---|---|---|---|
| JAR | 94.59 | 94.43 | 94.51 | 94.97 | 
| JAR(字符筛选) | 95.73 | 95.29 | 95.51 | 96.01 | 
| XML | 95.96 | 96.38 | 96.17 | 96.92 | 
| DEX(灰度图) | 95.53 | 95.83 | 95.68 | 96.04 | 
| DEX | 96.79 | 96.69 | 96.74 | 97.09 | 
| 合成图像 | 98.87 | 98.75 | 98.81 | 98.97 | 
| 1 | LI M B, WANG W, WANG P, et al. LibD: scalable and precise third-party library detection in Android markets [C]// Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering. Piscataway: IEEE, 2017: 335-346. 10.1109/icse.2017.38 | 
| 2 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 | 
| 3 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 7132-7141. 10.1109/cvpr.2018.00745 | 
| 4 | XIE S N, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5987-5995. 10.1109/cvpr.2017.634 | 
| 5 | LI X, WANG W H, HU X L, et al. Selective kernel networks [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 510-519. 10.1109/cvpr.2019.00060 | 
| 6 | ZHANG H, WU C R, ZHANG Z Y, et al. ResNeSt: split-attention networks [EB/OL]. [2021-03-02]. . 10.1155/2021/7544355 | 
| 7 | SANTOS I, BREZO F, NIEVES J, et al. Idea: opcode-sequence-based malware detection [C]// Proceedings of the 2010 International Symposium on Engineering Secure Software and Systems, LNCS 5965. Berlin: Springer, 2010: 35-43. | 
| 8 | WANG W, WANG X, FENG D W, et al. Exploring permission-induced risk in Android applications for malicious application detection [J]. IEEE Transactions on Information Forensics and Security, 2014, 9(11): 1869-1882. 10.1109/tifs.2014.2353996 | 
| 9 | GRINI L S, SHALAGINOV A, FRANKE K. Study of soft computing methods for large-scale multinomial malware types and families detection [M]// ZADEH L A, YAGER R R, SHAHBAZOVA S N, et al. Recent Developments and the New Direction in Soft-Computing Foundations and Applications, STUDFUZZ 361. Cham: Springer, 2018: 337-350. | 
| 10 | QIU J Y, ZHANG J, LUO W, et al. A3CM: automatic capability annotation for Android malware [J]. IEEE Access, 2019, 7: 147156-147168. 10.1109/access.2019.2946392 | 
| 11 | 张晨斌,张云春,郑杨,等.基于灰度图纹理指纹的恶意软件分类[J].计算机科学,2018,45(6A):383-386. 10.11896/j.issn.1002-137X.2018.Z6.083 | 
| ZHANG C B, ZHANG Y C, ZHENG Y, et al. Malware classification based on texture fingerprint of gray-scale images [J]. Computer Science, 2018, 45(6A): 383-386. 10.11896/j.issn.1002-137X.2018.Z6.083 | |
| 12 | HUANG T T H D, KAO H Y. R2-D2: color-inspired Convolutional Neural Network (CNN)-based Android malware detections [C]// Proceedings of the 2018 IEEE International Conference on Big Data. Piscataway: IEEE, 2018: 2633-2642. | 
| 13 | VASAN D, ALAZAB M, WASSAN S, et al. IMCFN: image-based malware classification using fine-tuned convolutional neural network architecture [J]. Computer Networks, 2020, 171: Article No.107138. 10.1016/j.comnet.2020.107138 | 
| 14 | 高杨晨,方勇,刘亮,等.基于卷积神经网络的Android恶意软件检测技术研究[J].四川大学学报(自然科学版),2020,57(4):673-680. 10.3969/j.issn.0490-6756.2020.04.009 | 
| GAO Y C, FANG Y, LIU L, et al. Android malware detection technology based on deep convolutional neural network [J]. Journal of Sichuan University (Natural Science Edition), 2020, 57(4): 673-680. 10.3969/j.issn.0490-6756.2020.04.009 | |
| 15 | ZHAO M H, ZHONG S S, FU X Y, et al. Deep residual shrinkage networks for fault diagnosis [J]. IEEE Transactions on Industrial Informatics, 2020, 16(7): 4681-4690. 10.1109/tii.2019.2943898 | 
| 16 | CUI Y, JIA M L, LIN T Y, et al. Class-balanced loss based on effective number of samples [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9260-9269. 10.1109/cvpr.2019.00949 | 
| 17 | NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware images: visualization and automatic classification [C]// Proceedings of the 2011 8th International Symposium on Visualization for Cyber Security. New York; ACM, 2011: Article No.4. 10.1145/2016904.2016908 | 
| 18 | CUI Z H, XUE F, CAI X J, et al. Detection of malicious code variants based on deep learning [J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3187-3196. 10.1109/tii.2018.2822680 | 
| 19 | 孙博文,张鹏,成茗宇,等.基于代码图像增强的恶意代码检测方法[J].清华大学学报(自然科学版),2020,60(5):386-392. 10.16511/j.cnki.qhdxxb.2020.25.008 | 
| SUN B W, ZHANG P, CHENG M Y, et al. Malware detection method based on enhanced code images [J]. Journal of Tsinghua University (Science and Technology), 2020, 60(5): 386-392. 10.16511/j.cnki.qhdxxb.2020.25.008 | |
| 20 | ARP D, SPREITZENBARTH M, HÜBNER M, et al. Drebin: efficient and explainable detection of Android malware in your pocket [C]// Proceedings of the 2014 Annual Network and Distributed System Security Symposium. Reston: Internet Society, 2014: 1-12. 10.14722/ndss.2014.23247 | 
| 21 | PRESS W H, TEUKOLSKY S A, VETTERLING W T, et al. Numerical Recipes: the Art of Scientific Computing [M]. 3rd ed. New York: Cambridge University Press, 2007: 123-128. | 
| 22 | KEYS R. Cubic convolution interpolation for digital image processing [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1981, 29(6): 1153-1160. 10.1109/tassp.1981.1163711 | 
| 23 | TURKOWSKI K. Filters for common resampling tasks [M]// GRASSNER A S. Graphics Gems. Waltham: Academic Press, 1990: 147-165. 10.1016/b978-0-08-050753-8.50042-5 | 
| 24 | LAWRENCE N D, SCHÖLKOPF B. Estimating a kernel fisher discriminant in the presence of label noise [C]// Proceedings of the 2001 18th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2001: 306-313. | 
| 25 | XIA S Y, WANG G Y, CHEN Z Z, et al. Complete random forest based class noise filtering learning for improving the generalizability of classifiers [J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(11): 2063-2078. 10.1109/tkde.2018.2873791 | 
| 26 | WU P X, ZHENG S Z, GOSWAMI M, et al. A topological filter for learning with label noise [EB/OL]. [2021-03-02]. . | 
| 27 | HE H B, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning [C]// Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway: IEEE, 2008: 1322-1328. 10.1109/ijcnn.2008.4633969 | 
| 28 | ZOU Y, YU Z D, VIJAYA KUMAR B V K, et al. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training [C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11207. Cham: Springer, 2018: 297-313. | 
| 29 | TAN M X, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks [C]// Proceedings of the 2019 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 6105-6114. | 
| 30 | SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2818-2826. 10.1109/cvpr.2016.308 | 
| [1] | Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413. | 
| [2] | Yuan TANG, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Relation extraction model based on multi-scale hybrid attention convolutional neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2011-2017. | 
| [3] | Feiyu ZHAI, Handa MA. Hybrid classical-quantum classification model based on DenseNet [J]. Journal of Computer Applications, 2024, 44(6): 1905-1910. | 
| [4] | Hongtian LI, Xinhao SHI, Weiguo PAN, Cheng XU, Bingxin XU, Jiazheng YUAN. Few-shot object detection via fusing multi-scale and attention mechanism [J]. Journal of Computer Applications, 2024, 44(5): 1437-1444. | 
| [5] | Wangjun SHI, Jing WANG, Xiaojun NING, Youfang LIN. Sleep stage classification model by meta transfer learning in few-shot scenarios [J]. Journal of Computer Applications, 2024, 44(5): 1445-1451. | 
| [6] | Haoran WANG, Dan YU, Yuli YANG, Yao MA, Yongle CHEN. Domain transfer intrusion detection method for unknown attacks on industrial control systems [J]. Journal of Computer Applications, 2024, 44(4): 1158-1165. | 
| [7] | Boyue WANG, Yingxiang LI, Jiandan ZHONG. Segmentation network for day and night ground-based cloud images based on improved Res-UNet [J]. Journal of Computer Applications, 2024, 44(4): 1310-1316. | 
| [8] | Qiujie LIU, Yuan WAN, Jie WU. Deep bi-modal source domain symmetrical transfer learning for cross-modal retrieval [J]. Journal of Computer Applications, 2024, 44(1): 24-31. | 
| [9] | Mengmeng CHEN, Zhiwei QIAO. Sparse reconstruction of CT images based on Uformer with fused channel attention [J]. Journal of Computer Applications, 2023, 43(9): 2948-2954. | 
| [10] | Meijia LIANG, Xinwu LIU, Xiaopeng HU. Small target detection algorithm for train operating environment image based on improved YOLOv3 [J]. Journal of Computer Applications, 2023, 43(8): 2611-2618. | 
| [11] | Kezheng CHEN, Xiaoran GUO, Yong ZHONG, Zhenping LI. Relation extraction method based on negative training and transfer learning [J]. Journal of Computer Applications, 2023, 43(8): 2426-2430. | 
| [12] | Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389. | 
| [13] | Bona XUAN, Jin LI, Yafei SONG, Zexuan MA. Malicious code classification method based on improved MobileNetV2 [J]. Journal of Computer Applications, 2023, 43(7): 2217-2225. | 
| [14] | Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning [J]. Journal of Computer Applications, 2023, 43(6): 1826-1832. | 
| [15] | Kai ZHANG, Zhengchu QIN, Yue LIU, Xinyi QIN. Multi-learning behavior collaborated knowledge tracing model [J]. Journal of Computer Applications, 2023, 43(5): 1422-1429. | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||