《计算机应用》唯一官方网站

• •    下一篇

基于命名实体识别的大规模物联网二进制组件识别

张立孝1,马垚2,杨玉丽2,于丹2,陈永乐2   

  1. 1. 太原理工大学计算机科学与技术学院(大数据学院)
    2. 太原理工大学
  • 收稿日期:2024-07-03 修回日期:2024-10-20 发布日期:2024-11-19 出版日期:2024-11-19
  • 通讯作者: 张立孝
  • 基金资助:
    山西省基础研究计划资助项目

Large scale IoT binary component identification based on named entity recognition

  • Received:2024-07-03 Revised:2024-10-20 Online:2024-11-19 Published:2024-11-19
  • Supported by:
    ShanXi Provincial Research Foundation for Basic Research

摘要: 物联网设备厂商在固件开发中通常会大量复用基于开源代码编译而成的开源组件,每个固件由上百个组件构成。如果这些组件未能及时更新,未打上安全补丁的开源组件可能会携带漏洞集成到固件中,进而给物联网设备埋下安全隐患。因此,识别物联网固件中的二进制组件对于确保物联网设备的安全性至关重要。针对现有方法难以大规模识别二进制组件的问题,提出一种基于命名实体识别的大规模物联网二进制组件识别方法。该方法首先通过固件解压提取内部二进制组件,然后通过可读字符串提取和组件执行两个方式获取组件的语义信息,最后,利用RoBERTa-BiLSTM-CRF命名实体识别模型识别组件名和版本号。在12个流行的物联网生产商发布的6575个固件上进行实验,该方法获得了87.67%的F1值,可成功识别163个二进制组件。实验结果表明:该方法有效扩大了物联网固件中二进制组件的识别范围,有助于从软件供应链角度保障固件安全。

关键词: 物联网, 软件供应链, 组件识别, 固件安全, 命名实体识别

Abstract: IoT device manufacturers often reuse a large number of open-source components compiled from open-source code in firmware development, with each firmware typically comprising hundreds of such components. If these components are not promptly updated,they may carry unpatched vulnerabilities, thereby posing significant security risks to IoT devices. Therefore, identifying binary components within IoT firmware is crucial for ensuring the security of IoT devices. To address the difficulty of existing methods in identifying binary components on a large scale, a novel method based on Named Entity Recognition (NER) was proposed. This method involves extracting internal binary components from firmware through decompression, and then sem antic information of the component was obtained through two methods: by extracting readable strings and by executing the component. Subsequently, the RoBERTa-BiLSTM-CRF NER model was utilized to identify component names and version numbers. Experimental results, conducted on 6,575 firmware samples released by 12 popular IoT manufacturers, demonstrate that the proposed method achieves an F1 score of 87.67%, successfully identifying 163 binary components. The experimental results show that this method effectively expands the identification range of binary components in IoT firmware, thereby enhancing firmware security from the perspective of the software supply chain.

Key words: internet of things, software supply chain, component identification, firmware security, named entity recognition

中图分类号: