Vulnerability classification framework for video surveillance network security based on large language models

doi:10.11772/j.issn.1001-9081.2025040474

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (4): 1158-1170.DOI: 10.11772/j.issn.1001-9081.2025040474

• Cyber security • Previous Articles

Vulnerability classification framework for video surveillance network security based on large language models

Xiaoyu WANG¹, Xin LI¹^,²^,³(), Di XUE¹, Zhangtao JIANG¹, Wei WANG¹, Yanjun XIAO⁴

^1.School of Information Network Security，People’s Public Security University of China，Beijing 100038，China
^2.Key Laboratory of Security Prevention Technology and Risk Assessment，Ministry of Public Security （People’s Public Security University of China），Beijing 100026，China
^3.Public Security Big Data Strategy Research Center，People’s Public Security University of China，Beijing 100038，China
^4.NSFOCUS Technologies Group Company Limited，Beijing 100048，China

Received:2025-04-29 Revised:2025-06-26 Accepted:2025-06-27 Online:2025-07-07 Published:2026-04-10
Contact: Xin LI
About author:WANG Xiaoyu， born in 2001， M. S. candidate. Her research interests include large language models， risk assessment.
XUE Di， born in 2001， M. S. candidate. His research interests include visual question answering， large language models.
JIANG Zhangtao， born in 2000， M. S. candidate. His research interests include cybersecurity， threat detection.
WANG Wei， born in 1981， Ph. D.， lecturer. Her research interests include cybersecurity， malware analysis.
XIAO Yanjun， born in 1981. His research interests include situational awareness， knowledge graph， AI-based decision-making and command.
Supported by:
National Key Research and Development Program of China(2022YFC3301101);CCF-NSFOCUS “Kunpeng” Research Fund(CCF-NSFOCUS202216)

基于大语言模型的视频监控网络安全漏洞分类框架

王晓宇¹, 李欣¹^,²^,³(), 薛迪¹, 蒋章涛¹, 王威¹, 肖岩军⁴

^1.中国人民公安大学信息网络安全学院，北京 100038
^2.安全防范技术与风险评估公安部重点实验室（中国人民公安大学），北京 100026
^3.中国人民公安大学公安大数据战略研究中心，北京 100038
^4.绿盟科技集团股份有限公司，北京 100048

通讯作者: 李欣
作者简介:王晓宇（2001—），女，湖北宜昌人，硕士研究生，CCF会员，主要研究方向：大语言模型、风险评估
薛迪（2001—），男，山东临沂人，硕士研究生，主要研究方向：视觉问答、大语言模型
蒋章涛（2000—），男，山东济南人，硕士研究生，主要研究方向：网络安全、威胁检测
王威（1981—），女，黑龙江伊春人，讲师，博士，主要研究方向：网络安全、恶意代码分析
肖岩军（1981—），男，河南平顶山人，主要研究方向：态势感知、知识图谱、人工智能决策指挥。
基金资助:
国家重点研发计划项目(2022YFC3301101);CCF-绿盟科技“鲲鹏”科研基金资助项目(CCF-NSFOCUS202216)

Abstract

Abstract:

Security vulnerabilities in video surveillance networks endanger public safety and even national security. Facing the continuous evolution of security threats， incremental learning methods are needed urgently. However， the existing methods suffer from classification inaccuracies in incremental learning due to three major challenges： insufficient few-shot learning performance， classification bias caused by semantic ambiguity， and limited capability to expand new categories dynamically. Therefore， an Incremental Vulnerability Classification Framework based on Large Language Model （LLM）（IVCF-LLM） was proposed. In the framework， data stratification and a dynamic threshold mechanism were employed to ensure balanced distribution of training data. In the top-level classification stage， firstly， GPT-4o was used for deep analysis to extract vulnerability trigger words from few samples， thereby generating high-quality classification prompt templates， termed as “skills”； then， the keyword extraction mechanism was optimized to identify vulnerability causes and attack methods precisely， thereby matching the optimal skill to guide GPT-3.5 Turbo for accurate classification； finally， the knowledge distillation technology was introduced to achieve seamless fusion of old and new skills， thereby realizing Class-Incremental Learning （CIL）. In the sub-layer classification stage， a Common Weakness Enumeration （CWE） knowledge graph was constructed， and static knowledge injection and dynamic relationship retrieval strategies were combined， so as to achieve fine-grained and precise classification. Experimental results demonstrate that on the self-built dataset， IVCF-LLM achieves accuracy of 75.0% and Matthews Correlation Coefficient （MCC） of 65.7%， outperforming models such as Text-to-Weakness mapping （Text2Weak）， Semantic Common weakness enumeration Predictor （SCP）， and prompt-based classification； on the general network security dataset， the accuracy of IVCF-LLM is significantly higher than that of SCP model by 15.9 percentage points， validating the proposed framework’s effectiveness and cross-scenario stability.

Key words: video surveillance network security, vulnerability classification, Large Language Model (LLM), Common Weakness Enumeration (CWE), Class-Incremental Learning (CIL)

摘要：

视频监控网络中的安全漏洞危害公共安全乃至国家安全。面对安全威胁的持续演进，亟须增量学习方法。然而，现有方法面临少样本学习性能不足、语义模糊致分类偏差和动态扩展新类别能力受限这三大挑战，导致增量学习分类失准。因此，提出一种基于大语言模型（LLM）的增量漏洞分类框架（IVCF-LLM），该框架采用数据分层与动态阈值机制确保训练数据的均衡分布。在顶层分类阶段，首先，利用GPT-4o深层分析环节从少量样本中提取漏洞触发词，生成高质量分类提示词模板（即技能）；其次，优化关键词提取机制，精准识别漏洞成因和攻击方式，匹配出最优技能指导GPT-3.5 Turbo实现准确分类；最后，引入知识蒸馏技术实现新旧技能的无缝融合，完成类别增量学习（CIL）。在子层分类阶段，通过构建常见弱点列举（CWE）知识图谱，结合静态知识注入与动态关系检索策略，实现细粒度精准分类。实验结果表明，在自建数据集上，IVCF-LLM在准确率和马修斯相关系数（MCC）上分别达到了75.0%和65.7%，均优于文本到弱点映射（Text2Weak）、语义常见弱点列举预测器（SCP）和提示词分类等模型；在通用网络安全数据集上，IVCF-LLM的准确率显著优于SCP模型15.9个百分点，验证了所提框架的有效性和跨场景稳定性。

关键词: 视频监控网络安全, 漏洞分类, 大语言模型, 常见弱点列举, 类别增量学习

CLC Number:

TP393.08

Xiaoyu WANG, Xin LI, Di XUE, Zhangtao JIANG, Wei WANG, Yanjun XIAO. Vulnerability classification framework for video surveillance network security based on large language models[J]. Journal of Computer Applications, 2026, 46(4): 1158-1170.

王晓宇, 李欣, 薛迪, 蒋章涛, 王威, 肖岩军. 基于大语言模型的视频监控网络安全漏洞分类框架[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1158-1170.

Figures/Tables 16

Fig. 1 Overall architecture of IVCF-LLM

Fig. 2 Flow of data collection and pre-processing

Fig. 3 Comparison of CWE data distribution before and after stratification

Fig. 4 Flow of VulCIL-LLM framework

Fig. 5 Example of skill generation

Fig. 6 Parent-child relationship diagram of CWE-284 multi-level nodes

Fig. 7 Distribution of number of sub-nodes in each layer

Tab. 1 Dataset division details by level

层级	总样本数N	类别数	$N t r a i n$	$N v a l$	$N t e s t$
D_T2-init	5 676	21	1 600	2 038	2 038
D_T2-new	4 742	26	480	2 131	2 131
D_T3	8 427	124	0	0	8 427
D_T4	5 624	65	0	0	5 624
D_T5	3 569	36	0	0	3 569

Tab. 1 Dataset division details by level

层级	总样本数N	类别数	$N t r a i n$	$N v a l$	$N t e s t$
D_T2-init	5 676	21	1 600	2 038	2 038
D_T2-new	4 742	26	480	2 131	2 131
D_T3	8 427	124	0	0	8 427
D_T4	5 624	65	0	0	5 624
D_T5	3 569	36	0	0	3 569

Tab. 2 Confusion matrix of binary classification problem

实际类别	预测类别
实际类别	预测为正类（1）	预测为负类（0）
实际为正类（1）	真实正例（TP）	假负例（FN）
实际为负类（0）	假正例（FP）	真实负例（TN）

Tab. 3 Experimental results with different parameter settings

$(K, n)$	顶层类别数	测试集分类准确率						$A c c e$	$V t o p e M C C$
$(K, n)$	顶层类别数	$a 1$	$a 2$	$a 3$	$a 4$	$a 5$	$a 6$	$A c c e$	$V t o p e M C C$
（10，5）	33	0.913	0.872	0.851	0.818	0.775	0.625	0.809	0.361
（10，10）	33	0.920	0.894	0.848	0.805	0.749	0.689	0.817	0.475
（20，8）	30	0.948	0.922	0.824	0.819	0.779	0.638	0.822	0.441
（20，16）	30	0.930	0.922	0.837	0.828	0.805	0.745	0.844	0.620
（40，8）	26	0.936	0.855	0.838	0.829	/	0.799	0.852	0.613
（40，16）	26	0.940	0.884	0.863	0.858	/	0.833	0.876	0.658
（40，20）	26	0.952	0.912	0.872	0.868	/	0.851	0.891	0.670
（60，18）	20	0.934	0.923	0.901	0.898	/	0.886	0.908	0.758
（60，30）	20	0.940	0.933	0.902	0.882	/	0.871	0.905	0.736

Tab. 3 Experimental results with different parameter settings

$(K, n)$	顶层类别数	测试集分类准确率						$A c c e$	$V t o p e M C C$
$(K, n)$	顶层类别数	$a 1$	$a 2$	$a 3$	$a 4$	$a 5$	$a 6$	$A c c e$	$V t o p e M C C$
（10，5）	33	0.913	0.872	0.851	0.818	0.775	0.625	0.809	0.361
（10，10）	33	0.920	0.894	0.848	0.805	0.749	0.689	0.817	0.475
（20，8）	30	0.948	0.922	0.824	0.819	0.779	0.638	0.822	0.441
（20，16）	30	0.930	0.922	0.837	0.828	0.805	0.745	0.844	0.620
（40，8）	26	0.936	0.855	0.838	0.829	/	0.799	0.852	0.613
（40，16）	26	0.940	0.884	0.863	0.858	/	0.833	0.876	0.658
（40，20）	26	0.952	0.912	0.872	0.868	/	0.851	0.891	0.670
（60，18）	20	0.934	0.923	0.901	0.898	/	0.886	0.908	0.758
（60，30）	20	0.940	0.933	0.902	0.882	/	0.871	0.905	0.736

Fig. 8 Top-level classification confusion matrices

Tab. 4 Results of comparison experiments

阶段	分类器	Acc	MCC
层次判断	SCP	85.2	37.7
顶层分类	Text2Weak（3-small）	39.8	38.3
	Text2Weak（3-small）（top5）	71.6	60.5
	SCP	68.2	66.4
	IVCF-LLM（GLM-4-Flash）	78.5	65.8
	IVCF-LLM（GPT-3.5 Turbo）	85.9	78.4
子层分类	SCP	55.0	45.8
	IVCF-LLM（GLM-4-Flash）	83.7	64.5
	IVCF-LLM（GPT-3.5 Turbo）	86.9	67.1
全局分类	Text2Weak（3-small）	20.3	10.6
	Text2Weak（3-small）（top5）	36.3	25.1
	Text2Weak（add-002）	17.1	7.0
	Text2Weak（add-002）（top5）	27.1	14.3
	SCP	52.9	51.5
	Prompt（GPT-3.5 Turbo）	55.8	50.9
	IVCF-LLM（GLM-4-Flash）	65.2	58.3
	IVCF-LLM（GPT-3.5 Turbo）	75.0	65.7

Tab. 5 Ablation experimental results

阶段	分类器	Acc	MCC
T2	IVCF-LLM（未优化）	55.9	47.8
	IVCF-LLM（删除1）	79.9	69.0
	IVCF-LLM（删除2）	81.5	72.0
	IVCF-LLM（删除3）	78.5	68.7
	IVCF-LLM（删除4）	74.4	63.6
	IVCF-LLM	85.9	78.4
T3	Standard Prompt	67.7	63.6
T3	IVCF-LLM	89.5	86.4
T4	Standard Prompt	85.0	68.8
T4	IVCF-LLM	98.0	97.4
T5	Standard Prompt	89.0	75.8
T5	IVCF-LLM	99.1	98.0

Tab. 6 Matching statistics of cross-scenario vulnerability subset

层级	总样本数N	类别数	$N t r a i n$	$N v a l$	$N t e s t$
D_T2-init	14 498	21	1 600	8 240	8 240
D_T2-new	1 605	26	480	562	563
D_T3	14 453	124	0	0	14 453
D_T4	6 786	65	0	0	6 786
D_T5	3 564	36	0	0	3 564

Tab. 6 Matching statistics of cross-scenario vulnerability subset

层级	总样本数N	类别数	$N t r a i n$	$N v a l$	$N t e s t$
D_T2-init	14 498	21	1 600	8 240	8 240
D_T2-new	1 605	26	480	562	563
D_T3	14 453	124	0	0	14 453
D_T4	6 786	65	0	0	6 786
D_T5	3 564	36	0	0	3 564

Tab. 7 Robustness test results of cross-scenario CWE features

阶段	分类器	Acc	MCC
层次判断	SCP	85.5	37.8
顶层分类	Text2Weak（3-small）	38.9	37.8
	Text2Weak（3-small）（top5）	71.8	61.0
	SCP	68.5	64.4
	IVCF-LLM（GLM-4-Flash）	72.1	59.8
	IVCF-LLM（GPT-3.5 Turbo）	79.6	68.4
子层分类	SCP	55.3	45.7
	IVCF-LLM（GLM-4-Flash）	83.7	64.5
	IVCF-LLM（GPT-3.5 Turbo）	86.7	67.0
全局分类	Text2Weak（3-small）	20.3	10.6
	Text2Weak（3-small）（top5）	36.5	25.0
	Text2Weak（add-002）	17.5	8.0
	Text2Weak（add-002）（top5）	27.0	14.3
	SCP	53.2	52.6
	Prompt（GPT-3.5 Turbo）	52.6	48.9
	IVCF-LLM（GLM-4-Flash）	60.2	51.3
	IVCF-LLM（GPT-3.5 Turbo）	69.1	60.7

Tab. 8 Distribution of resource consumption

任务类型	调用频次	平均响应时间/s	平均Token消耗
任务类型	调用频次	平均响应时间/s	输入	输出
技能生成	24	11.69±0.6	9 259	598
技能融合	4	10.83±0.7	2 825	936

References 28

[1]	刘玥. 视频监控脆弱性检测系统的设计与实现［J］. 现代信息科技， 2024， 8（24）： 158-162， 170.
	LIU Y. Design and implementation of video surveillance vulnerability detection system［J］. Modern Information Technology， 2024， 8（24）： 158-162， 170.
[2]	YOU Y， JIANG J， JIANG Z， et al. TIM： threat context-enhanced TTP intelligence mining on unstructured threat data［J］. Cybersecurity， 2022， 5： No.3.
[3]	HADDAD A， AARAJ N， NAKOV P， et al. Automated mapping of CVE vulnerability records to MITRE CWE weaknesses［EB/OL］. ［2024-11-14］..
[4]	WANG Q， GAO Y， REN J， et al. An automatic classification algorithm for software vulnerability based on weighted word vector and fusion neural network［J］. Computers and Security， 2023， 126： No.103070.
[5]	KOTA K， MANJUNATHA A， SREE V S. CWE prediction using CVE description — the semantic similarity approach［J］. Procedia Computer Science， 2024， 235： 1167-1178.
[6]	SIMONETTO S， OOSTVEEN R， VAN EDE T， et al. Text2Weak： mapping CVEs to CWEs using description embeddings analysis［EB/OL］. ［2024-11-15］..
[7]	OOSTVEEN R. CWE-ASSIST： a framework for automating CWE classification［D/OL］. ［2024-06-20］..
[8]	TURTIAINEN H， COSTIN A. VulnBERTa： on automating CWE weakness assignment and improving the quality of cybersecurity CVE vulnerabilities through ML/NLP［C］// Proceedings of the 2024 IEEE European Symposium on Security and Privacy Workshops. Piscataway： IEEE， 2024： 618-625.
[9]	DAS S S， DUTTA A， PUROHIT S， et al. Towards automatic mapping of vulnerabilities to attack patterns using large language models［C］// Proceedings of the 2022 IEEE International Symposium on Technologies for Homeland Security. Piscataway： IEEE， 2022： 1-7.
[10]	AGHAEI E， AL-SHAER E， SHADID W， et al. Automated CVE analysis for threat prioritization and impact prediction［EB/OL］. ［2024-11-15］..
[11]	王晓宇，李欣，胡勉宁，等. 基于大语言模型的CIL-LLM类别增量学习框架［J］. 计算机科学与探索， 2025， 19（2）： 374-384.
	WANG X Y， LI X， HU M N， et al. CIL-LLM： incremental learning framework based on large language models for category classification［J］. Journal of Frontiers of Computer Science and Technology， 2025， 19（2）： 374-384.
[12]	Common Weakness Enumeration. CWE view： research concepts［EB/OL］. ［2025-01-13］..
[13]	D’AUTUME C D M， RUDER S， KONG L， et al. Episodic memory in lifelong language learning［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2019： 13132-13141.
[14]	WANG Z， MEHTA S V， PÓCZOS B， et al. Efficient meta lifelong-learning with limited memory［EB/OL］. ［2024-09-25］..
[15]	KIRKPATRICK J， PASCANU R， RABINOWITZ N， et al. Overcoming catastrophic forgetting in neural networks［J］. Proceedings of the National Academy of Sciences of the United States of America， 2017， 114（13）： 3521-3526.
[16]	YAN S， XIE J， HE X. DER： dynamically expandable representation for class incremental learning［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 3013-3022.
[17]	YIN W， LI J， XIONG C. ConTinTin： continual learning from task instructions［C］// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg： ACL， 2022： 3062-3072.
[18]	WANG J， DONG D， SHOU L， et al. Effective continual learning for text classification with lightweight snapshots［C］// Proceedings of the 37th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2023： 10122-10130.
[19]	THUCNews dataset［DS/OL］. ［2024-07-22］..
[20]	OpenAI. Introducing GPT-4o and more tools to ChatGPT free users［EB/OL］. ［2025-01-13］..
[21]	OpenAI. GPT-3.5 Turbo fine-tuning and API updates［EB/OL］. ［2025-01-13］..
[22]	National Institute of Standards and Technology. National vulnerability database［DB/OL］. ［2025-01-13］..
[23]	MILOUSI K， KIRIAKIDIS P， MENGIDIS N， et al. Evaluating cybersecurity risk： a comprehensive comparison of vulnerability scoring methodologies［C］// Proceedings of the 19th International Conference on Availability， Reliability and Security. New York： ACM， 2024： No.52.
[24]	ZHOU Y， MURESANU A I， HAN Z， et al. Large language models are human-level prompt engineers［EB/OL］. ［2024-10-22］..
[25]	Povio. GPT-4 Turbo preview： exploring the 128k context window［EB/OL］. ［2024-07-22］..
[26]	OpenAI. New and improved embedding model［EB/OL］. ［2025-01-14］..
[27]	Team GLM. ChatGLM： a family of large language models from GLM-130B to GLM-4 all tools［EB/OL］. ［2024-07-30］..
[28]	张添植，周刚，张爽，等. 针对图文模态间实体对齐的目标实体情感分类［J］. 计算机工程， 2026， 52（3）： 222-233.
	ZHANG T Z， ZHOU G， ZHANG S， et al. Image-text multimodal entity alignment for target-oriented sentiment classification［J］. Computer Engineering， 2026， 52（3）： 222-233.

Vulnerability classification framework for video surveillance network security based on large language models

基于大语言模型的视频监控网络安全漏洞分类框架

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 28

Related Articles 15

Recommended Articles

Metrics

[1]	Kaizhou SHI, Xuan HE, Guoyi HOU, Gen LI, Shuanggao LI, Xiang HUANG. Airborne product metrological traceability knowledge graph construction method based on large language models [J]. Journal of Computer Applications, 2026, 46(4): 1086-1095.
[2]	Haoyang ZHANG, Liping ZHANG, Sheng YAN, Na LI, Xuefei ZHANG. Review of large language model methods for knowledge graph completion [J]. Journal of Computer Applications, 2026, 46(3): 683-695.
[3]	Enkang XI, Jing FAN, Yadong JIN, Hua DONG, Hao YU, Yihang SUN. Review of threats faced by federated learning in privacy and security field [J]. Journal of Computer Applications, 2026, 46(3): 798-808.
[4]	Yiming HUANG, Xihua ZOU, Guo DENG, Di ZHENG. Pre-answering and retrieval filtering： dual-stage optimization method for RAG-based question-answering systems [J]. Journal of Computer Applications, 2026, 46(3): 696-707.
[5]	Bin SHEN, Xiaoning CHEN, Hua CHENG, Yiquan FANG, Huifeng WANG. Intelligent undergraduate teaching evaluation system based on large language models [J]. Journal of Computer Applications, 2026, 46(3): 993-1003.
[6]	Dingjia WU, Zhe CUI. MG-SQL： SQL generation framework with enhanced schema linking and multi-generator collaboration [J]. Journal of Computer Applications, 2026, 46(3): 723-731.
[7]	Rilong WANG, Zhenping LI, Xiaosong LI, Qiang GAO, Ya HE, Yong ZHONG, Yingxiao ZHAO. Multi-Agent collaborative knowledge reasoning framework [J]. Journal of Computer Applications, 2026, 46(3): 708-714.
[8]	Yixin LIU, Xianggen LIU, Wen LIU, Hongbo DENG, Ziye ZHANG, Hua MU. Benchmark dataset for retrieval-augmented generation on long documents [J]. Journal of Computer Applications, 2026, 46(2): 386-394.
[9]	Fei GAO, Dong CHEN, Dixing BIAN, Wenqiang FAN, Qidong LIU, Pei LYU, Chaoyang ZHANG, Mingliang XU. Multistage coupled decision-making framework for researcher redeployment after discipline revocation [J]. Journal of Computer Applications, 2026, 46(2): 416-426.
[10]	Xinran XIE, Zhe CUI, Rui CHEN, Tailai PENG, Dekun LIN. Zero-shot re-ranking method by large language model with hierarchical filtering and label semantic extension [J]. Journal of Computer Applications, 2026, 46(1): 60-68.
[11]	Yi LIN, Bing XIA, Yong WANG, Shunda MENG, Juchong LIU, Shuqin ZHANG. AI-Agent based method for hidden RESTful API discovery and vulnerability detection [J]. Journal of Computer Applications, 2026, 46(1): 135-143.
[12]	Binbin ZHANG, Yongbin QIN, Ruizhang HUANG, Yanping CHEN. Judgment document summarization method combining large language model and dynamic prompts [J]. Journal of Computer Applications, 2025, 45(9): 2783-2789.
[13]	Tao FENG, Chen LIU. Dual-stage prompt tuning method for automated preference alignment [J]. Journal of Computer Applications, 2025, 45(8): 2442-2447.
[14]	Yiheng SUN, Maofu LIU. Tender information extraction method based on prompt tuning of knowledge [J]. Journal of Computer Applications, 2025, 45(4): 1169-1176.
[15]	Peng CAO, Guangqi WEN, Jinzhu YANG, Gang CHEN, Xinyi LIU, Xuechun JI. Efficient fine-tuning method of large language models for test case generation [J]. Journal of Computer Applications, 2025, 45(3): 725-731.