《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (7): 2288-2295.DOI: 10.11772/j.issn.1001-9081.2024070918

• 网络空间安全 • 上一篇    下一篇

基于命名实体识别的大规模物联网二进制组件识别

张立孝, 马垚, 杨玉丽, 于丹, 陈永乐()   

  1. 太原理工大学 计算机科学与技术学院(大数据学院),山西 晋中 030600
  • 收稿日期:2024-07-03 修回日期:2024-10-20 接受日期:2024-10-21 发布日期:2025-07-10 出版日期:2025-07-10
  • 通讯作者: 陈永乐
  • 作者简介:张立孝(1999—),男,山西吕梁人,硕士研究生,CCF会员,主要研究方向:物联网安全
    马垚(1982—),男,山西太原人,讲师,博士,CCF会员,主要研究方向:物联网安全
    杨玉丽(1979—),女,山西临汾人,讲师,博士,CCF会员,主要研究方向:云安全、区块链
    于丹(1983—),女,山西太原人,讲师,博士,CCF会员,主要研究方向:物联网安全
    陈永乐(1983—),男,山东潍坊人,教授,博士,CCF会员,主要研究方向:物联网安全。chenyongle@tyut.edu.cn
  • 基金资助:
    山西省基础研究计划项目(20210302124395)

Large-scale IoT binary component identification based on named entity recognition

Lixiao ZHANG, Yao MA, Yuli YANG, Dan YU, Yongle CHEN()   

  1. College of Computer Science and Technology (College of Data Science),Taiyuan University of Technology,Jinzhong Shanxi 030600,China
  • Received:2024-07-03 Revised:2024-10-20 Accepted:2024-10-21 Online:2025-07-10 Published:2025-07-10
  • Contact: Yongle CHEN
  • About author:ZHANG Lixiao, born in 1999, M. S. candidate. His research interests include internet of things security.
    MA Yao, born in 1982, Ph. D., lecturer. His research interests include internet of things security.
    YANG Yuli, born in 1979, Ph. D., lecturer. Her research interests include trusted cloud service computing, blockchain.
    YU Dan, born in 1983, Ph. D., lecturer. Her research interests include internet of things security.
    CHEN Yongle, born in 1983, Ph. D., professor. His research interests include internet of things security.
  • Supported by:
    Basic Research Program of Shanxi Province(20210302124395)

摘要:

物联网(IoT)设备厂商在固件开发中通常会大量复用基于开源代码编译而成的开源组件,每个固件通常由上百个这样的组件构成。如果这些组件未能及时更新,未打上安全补丁的开源组件可能会携带着漏洞集成到固件中,进而给IoT设备埋下安全隐患。因此,识别IoT固件中的二进制组件对于确保IoT设备的安全性至关重要。针对现有方法难以大规模识别二进制组件的问题,提出一种基于命名实体识别(NER)的大规模IoT二进制组件识别方法。首先,通过固件解压提取固件内部的二进制组件;然后,通过可读字符串提取和组件执行这两个方式获取组件的语义信息;最后,利用RoBERTa-BiLSTM-CRF的NER模型识别组件名和版本号。在12个流行的IoT生产商发布的6 575个固件上的实验结果表明,所提方法获得了87.67%的F1值,可成功识别163个二进制组件。可见,该方法有效扩大了IoT固件中二进制组件的识别范围,有助于从软件供应链的角度保障固件安全。

关键词: 物联网, 软件供应链, 组件识别, 固件安全, 命名实体识别

Abstract:

Internet of Things (IoT) device manufacturers often reuse a large number of open-source components compiled from open-source code in firmware development, with each firmware typically comprising hundreds of such components. If these components are not updated promptly, they may carry unpatched vulnerabilities to integrate into the firmware, thereby posing significant security risks to IoT devices. Therefore, identifying binary components in IoT firmware is crucial for ensuring the security of IoT devices. To address the difficulty of the existing methods in identifying binary components on a large scale, a large-scale IoT binary component identification method based on Named Entity Recognition (NER) was proposed. Firstly, internal binary components were extracted from firmware through decompression. Then, semantic information of the component was obtained through two ways: extraction of readable strings and execution of the component. Finally, the RoBERTa-BiLSTM-CRF’s NER model was utilized to identify component names and version numbers. Experimental results on 6 575 firmware samples released by 12 popular IoT manufacturers demonstrate that the proposed method achieves an F1 value of 87.67%, and identifying 163 binary components successfully. It can be seen that this method effectively expands the identification range of binary components in IoT firmware, enhancing firmware security from the perspective of software supply chain.

Key words: Internet of Things (IoT), software supply chain, component identification, firmware security, Named Entity Recognition (NER)

中图分类号: