基于大语言模型链接的网络安全实体识别方法

doi:10.11772/j.issn.1001-9081.2025070867

《计算机应用》唯一官方网站

• • 下一篇

基于大语言模型链接的网络安全实体识别方法

侯迪迪,洪少东,付玉杰,崔允贺,申国伟

贵州大学

收稿日期:2025-08-01 修回日期:2025-08-28 发布日期:2025-11-05 出版日期:2025-11-05
通讯作者: 侯迪迪

Cybersecurity named entity recognition method based on large language model linking

Received:2025-08-01 Revised:2025-08-28 Online:2025-11-05 Published:2025-11-05

摘要/Abstract

摘要： 网络安全知识图谱是对非结构化网络威胁情报数据进行深度关联分析的重要手段，使用大语言模型(LLM)能够克服传统的网络安全命名实体识别方法存在缺乏对文本上下文理解、难以捕捉长距离依赖与面对噪声和冗余时缺乏鲁棒性的局限，而单一大语言模型在网络安全实体抽取任务中存在模型能力边界外领域适应性差、分布外(OOD)数据上泛化能力不足及单一模型认知局限的情况，为解决以上问题，提出一种融合微调的大语言模型协同链接方法。该方法基于网络安全威胁情报语料对基础LLM进行低秩自适应参数微调(LoRA)微调，通过参数优化增强模型对网络安全领域的理解，同时LLM协同采用了静态投票与动态元学习器的链接方式，静态机制通过投票综合多个模型的预测结果，动态机制设计特征空间适配函数量化模型能力边界，通过随机森林等元学习器实现输入数据特征与模型性能的动态映射。实验结果表明，所提方法的F1分数达到了0.91，与基于多特征的语义增强网络方法相比提升了6.73%，为网络安全命名实体识别提供了高效可扩展的解决方案。

Abstract: Cybersecurity knowledge graph is an important means to conduct deep correlation analysis on unstructured cyber threat intelligence data. Use of large language model (LLM) can overcome the limitations of traditional cybersecurity named entity recognition methods, such as lack of understanding of text context and difficulty in capturing long-distance dependencies. However, the extraction of cybersecurity entities by a single LLM will lead to poor adaptability to the domain outside the model's capability boundary, insufficient generalization ability on out-of-distribution (OOD) data, and cognitive limitations of a single model. To solve these problems, a multi-model collaborative recognition method that integrates fine-tuning and linking was proposed. Low-Rank Adaptation of large language models (LoRA) fine-tuning was performed on a base LLM using a cybersecurity threat intelligence corpus, and the model's understanding of the cybersecurity field was enhanced through parameter optimization; LLM collaboration adopted static voting and dynamic meta-learner linking. The static mechanism integrated prediction results of multiple models through voting, and the dynamic mechanism designed the feature space adaptation function to quantify the model’s capability boundary. Dynamic mapping of input data features to model performance was realized through meta-learners such as random forests. Experimental results show that the F1 score of the proposed method reaches 0.91, which is 6.73% higher than the semantic enhancement network method based on multiple features, providing an efficient and scalable solution for named entity recognition in cybersecurity threat intelligence analysis.

中图分类号:

TP391.1

侯迪迪洪少东付玉杰崔允贺申国伟. 基于大语言模型链接的网络安全实体识别方法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2025070867.

[1]	王岩陈曦赵中恺周欢吴涛艾梦格肖雪松. 相似度感知的民族文化知识图谱链路预测模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[2]	黄远航荣娜. 结合注意力机制与深度强化学习的无模型光伏接入容量评估方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[3]	王昱麒张仰森王璞. 融合知识增强和对比学习的高中英语阅读理解模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[4]	王伊璠韩虎李栋范雅婷李琳. 异构增强与多源知识融合的方面级情感分析[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[5]	黄朋, 林佳瑜, 梁祖红. 基于互信息和提示学习的中文无监督对比学习方法[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3101-3110.
[6]	龚永罡, 陈舒汉, 廉小亲, 李乾生, 莫鸿铭, 刘宏宇. 基于大语言模型的中文开放领域实体关系抽取策略[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3121-3130.
[7]	范锦涛, 陈艳平, 杨采薇, 林川. 结合边界信息的对比学习嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3111-3120.
[8]	蒋章涛, 李欣, 张士豪, 赵心阳. 融合BERT与X-means算法的微博舆情热度分析预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3138-3145.
[9]	曹泽毅, 昌燕, 赖仁鑫, 张仕斌, 秦智, 闫丽丽, 张雪健, 狄元灏. 面向大规模机构分散存储数据的基于属性的实体对齐算法[J]. 《计算机应用》唯一官方网站, 2025, 45(10): 3195-3202.
[10]	李佳航韩启龙李丽洁张慧. 基于大语言模型的超关系知识图谱限定符增强方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[11]	任登燃, 王淑营. 基于差分边界增强的风电装备嵌套命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2798-2805.
[12]	梁一鸣, 范菁, 柴汶泽. 基于双向交叉注意力的多尺度特征融合情感分类[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 2773-2782.
[13]	张棣锐林佳瑜梁祖红. 基于不确定性感知非似然学习的监督对比生成式情感分析方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[14]	刘欢娴王洪涛王宪奥王洪梅徐伟峰. 跨模态语义关联的多模态事实验证[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[15]	郑嘉丽周刚陈静李顺航. 基于多特征自适应融合的智能生成文本检测方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.

基于大语言模型链接的网络安全实体识别方法

Cybersecurity named entity recognition method based on large language model linking

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics