Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (7): 2288-2295.DOI: 10.11772/j.issn.1001-9081.2024070918
• Cyber security • Previous Articles Next Articles
Lixiao ZHANG, Yao MA, Yuli YANG, Dan YU, Yongle CHEN()
Received:
2024-07-03
Revised:
2024-10-20
Accepted:
2024-10-21
Online:
2025-07-10
Published:
2025-07-10
Contact:
Yongle CHEN
About author:
ZHANG Lixiao, born in 1999, M. S. candidate. His research interests include internet of things security.Supported by:
通讯作者:
陈永乐
作者简介:
张立孝(1999—),男,山西吕梁人,硕士研究生,CCF会员,主要研究方向:物联网安全基金资助:
CLC Number:
Lixiao ZHANG, Yao MA, Yuli YANG, Dan YU, Yongle CHEN. Large-scale IoT binary component identification based on named entity recognition[J]. Journal of Computer Applications, 2025, 45(7): 2288-2295.
张立孝, 马垚, 杨玉丽, 于丹, 陈永乐. 基于命名实体识别的大规模物联网二进制组件识别[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2288-2295.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024070918
语义信息分类 | 语义信息示例 | |
---|---|---|
组件版本语义信息 | BusyBox v1.1.2 (2018.10.19-17:25+0000) | |
非组件版本语义信息 | 依赖库信息 | libcrypto.so.1.0.0 |
GLIBC_2.0 | ||
2.1.117 | ||
v5.4 | ||
udhcp 0.9.9-pre | ||
… | ||
格式化字符串 | %-8.8s | |
%4.1f%% | ||
%u blocks (%2.2f%%) reserved for the super user | ||
%-9.9s Link encap:%s | ||
GET %stp://[%s]:%d/%s HTTP/1.1 | ||
… | ||
其他信息 | /opt_2/src/RTL_360_F5S/user/busybox/busybox-1.1.2/e2fsprogs/e2fsck.c | |
$Id: vi.c,v 1.38 2004/08/19 19:15:06 andersen Exp $ This is not GNU sed version 4.0 | ||
some 2.4 kernels do not support blocksizes greater than 4096 using ext3. | ||
… |
Tab. 1 Examples of readable string
语义信息分类 | 语义信息示例 | |
---|---|---|
组件版本语义信息 | BusyBox v1.1.2 (2018.10.19-17:25+0000) | |
非组件版本语义信息 | 依赖库信息 | libcrypto.so.1.0.0 |
GLIBC_2.0 | ||
2.1.117 | ||
v5.4 | ||
udhcp 0.9.9-pre | ||
… | ||
格式化字符串 | %-8.8s | |
%4.1f%% | ||
%u blocks (%2.2f%%) reserved for the super user | ||
%-9.9s Link encap:%s | ||
GET %stp://[%s]:%d/%s HTTP/1.1 | ||
… | ||
其他信息 | /opt_2/src/RTL_360_F5S/user/busybox/busybox-1.1.2/e2fsprogs/e2fsck.c | |
$Id: vi.c,v 1.38 2004/08/19 19:15:06 andersen Exp $ This is not GNU sed version 4.0 | ||
some 2.4 kernels do not support blocksizes greater than 4096 using ext3. | ||
… |
语义信息分类 | 语义信息示例 | |
---|---|---|
组件版本语义信息 | iptables v1.4.21 | |
非组件版本语义信息 | 提示信息 | iptables v1.4.21: no command specified |
Try 'iptables -h' or 'iptables --help' for more information. | ||
命令基本用法 | Usage: iptables -[ACD] chain rule-specification [options] | |
iptables -I chain [rulenum] rule-specification [options] | ||
iptables -R chain rulenum rule-specification [options] | ||
iptables -D chain rulenum [options] | ||
… | ||
命令详细说明 | Commands: | |
Either long or short options are allowed. | ||
--append -A chain Append to chain | ||
--check -C chain Check for the existence of a rule | ||
--delete -D chain Delete matching rule from chain | ||
… | ||
命令中可用的选项详细说明 | Options: | |
--ipv4 -4 Nothing (line is ignored by ip6tables-restore) | ||
--ipv6 -6 Error (line is ignored by iptables-restore) | ||
[!] --protoco l-p proto protocol: by number or name, eg. 'tcp' | ||
… |
Tab. 2 Examples of semantic output for component execution
语义信息分类 | 语义信息示例 | |
---|---|---|
组件版本语义信息 | iptables v1.4.21 | |
非组件版本语义信息 | 提示信息 | iptables v1.4.21: no command specified |
Try 'iptables -h' or 'iptables --help' for more information. | ||
命令基本用法 | Usage: iptables -[ACD] chain rule-specification [options] | |
iptables -I chain [rulenum] rule-specification [options] | ||
iptables -R chain rulenum rule-specification [options] | ||
iptables -D chain rulenum [options] | ||
… | ||
命令详细说明 | Commands: | |
Either long or short options are allowed. | ||
--append -A chain Append to chain | ||
--check -C chain Check for the existence of a rule | ||
--delete -D chain Delete matching rule from chain | ||
… | ||
命令中可用的选项详细说明 | Options: | |
--ipv4 -4 Nothing (line is ignored by ip6tables-restore) | ||
--ipv6 -6 Error (line is ignored by iptables-restore) | ||
[!] --protoco l-p proto protocol: by number or name, eg. 'tcp' | ||
… |
实体类型 | 实体标签 | 标签说明 |
---|---|---|
组件名 | B-组件名 | “组件名”的起始位置 |
I-组件名 | “组件名”的中间或结束位置 | |
版本号 | B-版本号 | “版本号”的起始位置 |
I-版本号 | “版本号”的中间或结束位置 |
Tab. 3 Entity label description
实体类型 | 实体标签 | 标签说明 |
---|---|---|
组件名 | B-组件名 | “组件名”的起始位置 |
I-组件名 | “组件名”的中间或结束位置 | |
版本号 | B-版本号 | “版本号”的起始位置 |
I-版本号 | “版本号”的中间或结束位置 |
环境名称 | 环境参数 |
---|---|
操作系统 | Ubuntu 18.04 |
GPU | 10 GB NVIDIA GeForce RTX 3080 |
编程语言 | Python 3.9 |
深度学习框架 | PyTorch 2.1.0 |
网络爬虫框架 | Scrapy 2.5 |
固件解压 | Binwalk 2.3.3 |
模拟执行 | QEMU 6.1 |
Tab. 4 Experimental environmental parameters
环境名称 | 环境参数 |
---|---|
操作系统 | Ubuntu 18.04 |
GPU | 10 GB NVIDIA GeForce RTX 3080 |
编程语言 | Python 3.9 |
深度学习框架 | PyTorch 2.1.0 |
网络爬虫框架 | Scrapy 2.5 |
固件解压 | Binwalk 2.3.3 |
模拟执行 | QEMU 6.1 |
参数 | 值 | 参数 | 值 |
---|---|---|---|
max_length | 100 | dropout | 0.3 |
LSTM_size | 128 | learning rate | 0.000 01 |
batch_size | 32 |
Tab. 5 Model parameters
参数 | 值 | 参数 | 值 |
---|---|---|---|
max_length | 100 | dropout | 0.3 |
LSTM_size | 128 | learning rate | 0.000 01 |
batch_size | 32 |
组件识别方法 | P | R | F1 |
---|---|---|---|
正则表达式 | 59.69 | 69.75 | 64.33 |
BERT-BiGRU-CRF | 75.88 | 76.52 | 76.20 |
BERT-BiLSTM-CRF | 77.31 | 81.32 | 79.26 |
本文模型 | 89.53 | 85.89 | 87.67 |
Tab. 6 Comparative experimental results of component entity recognition
组件识别方法 | P | R | F1 |
---|---|---|---|
正则表达式 | 59.69 | 69.75 | 64.33 |
BERT-BiGRU-CRF | 75.88 | 76.52 | 76.20 |
BERT-BiLSTM-CRF | 77.31 | 81.32 | 79.26 |
本文模型 | 89.53 | 85.89 | 87.67 |
RoBERTa | CRF | BiLSTM | 评价指标 | ||
---|---|---|---|---|---|
P | R | F1 | |||
√ | 72.25 | 73.89 | 73.06 | ||
√ | √ | 75.16 | 77.62 | 76.37 | |
√ | √ | √ | 89.53 | 85.89 | 87.67 |
Tab. 7 Results of ablation experiments
RoBERTa | CRF | BiLSTM | 评价指标 | ||
---|---|---|---|---|---|
P | R | F1 | |||
√ | 72.25 | 73.89 | 73.06 | ||
√ | √ | 75.16 | 77.62 | 76.37 | |
√ | √ | √ | 89.53 | 85.89 | 87.67 |
组件识别方法 | 组件识别数 |
---|---|
正则表达式 | 63 |
VES | 5 |
FirmUp | 5 |
FirmSEC | 92 |
本文方法 | 163 |
Tab. 8 Comparison of component identification quantity
组件识别方法 | 组件识别数 |
---|---|
正则表达式 | 63 |
VES | 5 |
FirmUp | 5 |
FirmSEC | 92 |
本文方法 | 163 |
[1] | 樊琳娜,李城龙,吴毅超,等.物联网设备识别及异常检测研究综述[J].软件学报,2024, 35(1): 288-308. |
FAN L N, LI C L, WU Y C, et al. Survey on IoT device identification and anomaly detection [J]. Journal of Software, 2024, 35(1): 288-308. | |
[2] | ZHAO B, JI S, XU J, et al. A large-scale empirical analysis of the vulnerabilities introduced by third-party components in IoT firmware [C]// Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2022: 442-454. |
[3] | 况博裕,张兆博,杨善权,等. HMFuzzer:一种基于人机协同的物联网设备固件漏洞挖掘方案[J].计算机学报,2024, 47(3): 703-716. |
KUANG B Y, ZHANG Z B, YANG S Q, et al. HMFuzzer: a human-machine collaboration-based firmware vulnerability mining scheme for IoT devices [J]. Chinese Journal of Computers, 2024, 47(3): 703-716. | |
[4] | DAVID Y, PARTUSH N, YAHAV E. FirmUp: precise static detection of common vulnerabilities in firmware [J]. ACM SIGPLAN Notices, 2018, 53(2): 392-404. |
[5] | CHENG Y, YANG S, LANG Z, et al. VERI: a large-scale open-source components vulnerability detection in IoT firmware [J]. Computers and Security, 2023, 126: No.103068. |
[6] | LI S, WANG Y, DONG C, et al. LibAM: an area matching framework for detecting third-party libraries in binaries [J]. ACM Transactions on Software Engineering and Methodology, 2024, 33(2): No.52. |
[7] | DONG C, LI S, YANG S, et al. LibvDiff: library version difference guided OSS version identification in binaries [C]// Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. New York: ACM, 2024: No.66. |
[8] | ZHAN X, FAN L, CHEN S, et al. ATVHunter: reliable version detection of third-party libraries for vulnerability identification in Android applications [C]// Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering. Piscataway: IEEE, 2021: 1695-1707. |
[9] | XU X, LIU C, FENG Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection [C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York: ACM, 2017: 363-376. |
[10] | 高翔,王石,朱俊武,等.命名实体识别任务综述[J].计算机科学,2023, 50(6A): No.220200119. |
GAO X, WANG S, ZHU J W, et al. Overview of named entity recognition tasks [J]. Computer Science, 2023, 50(6A): No.220200119. | |
[11] | HAMMERTON J. Named entity recognition with long short-term memory [C]// Proceedings of the 7th Conference on Natural language learning at HLT-NAACL. Stroudsburg: ACL, 2003: 172-175. |
[12] | LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2016: 260-270. |
[13] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. |
[14] | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186. |
[15] | 王浩畅,和婷婷,郑冠彧.融合词汇边界信息的合同实体识别方法[J].计算机工程与设计,2024, 45(6): 1757-1763. |
WANG H C, HE T T, ZHENG G Y. Contract entity recognition method with lexical boundary information [J]. Computer Engineering and Design, 2024, 45(6): 1757-1763. | |
[16] | 闫璟辉,宗成庆,徐金安.中文医疗文本中的嵌套实体识别方法[J].软件学报,2024, 35(6): 2923-2935. |
YAN J H, ZONG C Q, XU J A. Nested entity recognition approach in Chinese medical text [J]. Journal of Software, 2024, 35(6): 2923-2935. | |
[17] | 马健伟,王铁鑫,江宏,等.基于深度语义分析的警务卷宗知识抽取[J].计算机研究与发展,2024, 61(5): 1325-1335. |
MA J W, WANG T X, JIANG H, et al. Knowledge extraction based on deep semantics analysis towards police dossier [J]. Journal of Computer Research and Development, 2024, 61(5): 1325-1335. | |
[18] | LI X, GUO Z, WANG W, et al. An intelligent named entity recognition method based on IoT professional knowledge [C]// Proceedings of the 2nd Asia Conference on Information Engineering. Piscataway: IEEE, 2022: 67-71. |
[19] | WANG Y, WANG Z, LI H, et al. A hybrid Chinese named entity recognition method for Internet of Things [C]// Proceedings of the SPIE 12176, International Conference on Algorithms, Microchips and Network Applications. Bellingham, WA: SPIE, 2022: No.121762A. |
[20] | 隗昊,刁宏悦,孔亮宸,等.东北亚舆情文本细粒度命名实体识别方法研究[J].计算机工程,2024, 50(5): 354-362. |
WEI H, DIAO H Y, KONG L C, et al. Research on fine-grained named-entity-recognition method for public-opinion texts in Northeast Asia [J]. Computer Engineering, 2024, 50(5): 354-362. | |
[21] | 陆鑫涛,孙丽萍,凌晨,等.融入拼音与词性特征的中文电子病历命名实体识别[J/OL].小型微型计算机系统[2024-04-22]. |
LU X T, SUN L P, LING C, et al. Named entity recognition of Chinese electronic health records incorporating phonetic and part-of-speech features [J/OL]. Journal of Chinese Computer Systems[2024-04-22]. | |
[22] | 党小超,刘涧,董晓辉,等.面向不平衡数据的机械设备故障命名实体识别[J].计算机工程,2024, 50(9): 104-112. |
DANG X C, LIU J, DONG X H, et al. Named entity recognition for mechanical equipment failure for imbalanced data [J]. Computer Engineering, 2024, 50(9): 104-112. | |
[23] | HU X, ZHANG W, LI H, et al. VES: a component version extracting system for large-scale IoT firmwares [C]// Proceedings of the 2020 International Conference on Wireless Algorithms, Systems, and Applications, LNCS 12385. Cham: Springer, 2020: 39-48. |
[1] | Zhangjie XU, Yanping CHEN, Ying HU, Ruizhang HUANG, Yongbin QIN. Nested named entity recognition combined with boundary generation by multi-objective learning [J]. Journal of Computer Applications, 2025, 45(7): 2229-2236. |
[2] | Xiaoyang ZHAO, Xinzheng XU, Zhongnian LI. Research review on explainable artificial intelligence in internet of things applications [J]. Journal of Computer Applications, 2025, 45(7): 2169-2179. |
[3] | Biqing ZENG, Guangbin ZHONG, James Zhiqing WEN. Few-shot named entity recognition based on decomposed fuzzy span [J]. Journal of Computer Applications, 2025, 45(5): 1504-1510. |
[4] | Jie HU, Shuaixing WU, Zhilan CAO, Yan ZHANG. Named entity recognition model based on global information fusion and multi-dimensional relation perception [J]. Journal of Computer Applications, 2025, 45(5): 1511-1519. |
[5] | Zidong CHENG, Peng LI, Feng ZHU. Potential relation mining in internet of things threat intelligence knowledge graph [J]. Journal of Computer Applications, 2025, 45(1): 24-31. |
[6] | Zhibin ZUO, Kai YANG, Miaolei DENG, Demin WANG, Mimi MA. Dynamic network defense scheme based on programmable software defined networks [J]. Journal of Computer Applications, 2025, 45(1): 144-152. |
[7] | Xueqiang LYU, Tao WANG, Xindong YOU, Ge XU. HTLR: named entity recognition framework with hierarchical fusion of multi-knowledge [J]. Journal of Computer Applications, 2025, 45(1): 40-47. |
[8] | Huanliang SUN, Siyi WANG, Junling LIU, Jingke XU. Help-seeking information extraction model for flood event in social media data [J]. Journal of Computer Applications, 2024, 44(8): 2437-2445. |
[9] | Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG. Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information [J]. Journal of Computer Applications, 2024, 44(6): 1706-1712. |
[10] | Yongfeng DONG, Jiaming BAI, Liqin WANG, Xu WANG. Chinese named entity recognition combining prior knowledge and glyph features [J]. Journal of Computer Applications, 2024, 44(3): 702-708. |
[11] | Ziqian CHEN, Kedi NIU, Zhongyuan YAO, Xueming SI. Review of blockchain lightweight technology applied to internet of things [J]. Journal of Computer Applications, 2024, 44(12): 3688-3698. |
[12] | Kedi NIU, Min LI, Zhongyuan YAO, Xueming SI. Review of blockchain consensus algorithms for internet of things [J]. Journal of Computer Applications, 2024, 44(12): 3678-3687. |
[13] | Yicheng WAN, Guangxiang YANG, Qingda ZHANG, Chenyang GAN, Lin YI. Impact of non-persistent carrier sense multiple access mechanism on scalability of LoRa networks [J]. Journal of Computer Applications, 2023, 43(9): 2885-2896. |
[14] | Xiaoyan ZHANG, Zhengyu DUAN. Cross-lingual zero-resource named entity recognition model based on sentence-level generative adversarial network [J]. Journal of Computer Applications, 2023, 43(8): 2406-2411. |
[15] | Jingsheng LEI, Kaijun LA, Shengying YANG, Yi WU. Joint entity and relation extraction based on contextual semantic enhancement [J]. Journal of Computer Applications, 2023, 43(5): 1438-1444. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||