《计算机应用》唯一官方网站

• •    下一篇

基于大语言模型链接的网络安全实体识别方法

侯迪迪,洪少东,付玉杰,崔允贺,申国伟   

  1. 贵州大学
  • 收稿日期:2025-08-01 修回日期:2025-08-28 发布日期:2025-11-05 出版日期:2025-11-05
  • 通讯作者: 侯迪迪

Cybersecurity named entity recognition method based on large language model linking

  • Received:2025-08-01 Revised:2025-08-28 Online:2025-11-05 Published:2025-11-05

摘要: 网络安全知识图谱是对非结构化网络威胁情报数据进行深度关联分析的重要手段,使用大语言模型(LLM)能够克服传统的网络安全命名实体识别方法存在缺乏对文本上下文理解、难以捕捉长距离依赖与面对噪声和冗余时缺乏鲁棒性的局限,而单一大语言模型在网络安全实体抽取任务中存在模型能力边界外领域适应性差、分布外(OOD)数据上泛化能力不足及单一模型认知局限的情况,为解决以上问题,提出一种融合微调的大语言模型协同链接方法。该方法基于网络安全威胁情报语料对基础LLM进行低秩自适应参数微调(LoRA)微调,通过参数优化增强模型对网络安全领域的理解,同时LLM协同采用了静态投票与动态元学习器的链接方式,静态机制通过投票综合多个模型的预测结果,动态机制设计特征空间适配函数量化模型能力边界,通过随机森林等元学习器实现输入数据特征与模型性能的动态映射。实验结果表明,所提方法的F1分数达到了0.91,与基于多特征的语义增强网络方法相比提升了6.73%,为网络安全命名实体识别提供了高效可扩展的解决方案。

Abstract: Cybersecurity knowledge graph is an important means to conduct deep correlation analysis on unstructured cyber threat intelligence data. Use of large language model (LLM) can overcome the limitations of traditional cybersecurity named entity recognition methods, such as lack of understanding of text context and difficulty in capturing long-distance dependencies. However, the extraction of cybersecurity entities by a single LLM will lead to poor adaptability to the domain outside the model's capability boundary, insufficient generalization ability on out-of-distribution (OOD) data, and cognitive limitations of a single model. To solve these problems, a multi-model collaborative recognition method that integrates fine-tuning and linking was proposed. Low-Rank Adaptation of large language models (LoRA) fine-tuning was performed on a base LLM using a cybersecurity threat intelligence corpus, and the model's understanding of the cybersecurity field was enhanced through parameter optimization; LLM collaboration adopted static voting and dynamic meta-learner linking. The static mechanism integrated prediction results of multiple models through voting, and the dynamic mechanism designed the feature space adaptation function to quantify the model’s capability boundary. Dynamic mapping of input data features to model performance was realized through meta-learners such as random forests. Experimental results show that the F1 score of the proposed method reaches 0.91, which is 6.73% higher than the semantic enhancement network method based on multiple features, providing an efficient and scalable solution for named entity recognition in cybersecurity threat intelligence analysis.

中图分类号: