《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (6): 1689-1694.DOI: 10.11772/j.issn.1001-9081.2021061424

• 2021年全国开放式分布与并行计算学术年会(DPCS 2021)论文 • 上一篇    

基于强化学习的渗透路径推荐模型

赵海妮1,2, 焦健1,2()   

  1. 1.北京信息科技大学 计算机学院,北京 100101
    2.网络文化与数字传播北京市重点实验室(北京信息科技大学),北京 100101
  • 收稿日期:2021-08-09 修回日期:2021-10-16 接受日期:2021-10-29 发布日期:2022-01-10 出版日期:2022-06-10
  • 通讯作者: 焦健
  • 作者简介:赵海妮(1997—),女,安徽阜阳人,硕士研究生,CCF会员,主要研究方向:网络安全、渗透测试
  • 基金资助:
    网络文化与数字传播北京市重点实验室开放课题(ICDDXN006)

Recommendation model of penetration path based on reinforcement learning

Haini ZHAO1,2, Jian JIAO1,2()   

  1. 1.Computer School,Beijing Information Science and Technology University,Beijing 100101,China
    2.Beijing Key Laboratory of Internet Culture and Digital Dissemination Research (Beijing Information Science and Technology University),Beijing 100101,China
  • Received:2021-08-09 Revised:2021-10-16 Accepted:2021-10-29 Online:2022-01-10 Published:2022-06-10
  • Contact: Jian JIAO
  • About author:ZHAO Haini, born in 1997, M. S. candidate. Her research interests include network security, penetration test.
  • Supported by:
    Opening Project of Beijing Key Laboratory of Internet Culture and Digital Dissemination Research(ICDDXN006)

摘要:

渗透测试的核心问题是渗透测试路径的规划,手动规划依赖测试人员的经验,而自动生成渗透路径主要基于网络安全的先验知识和特定的漏洞或网络场景,所需成本高且缺乏灵活性。针对这些问题,提出一种基于强化学习的渗透路径推荐模型QLPT,通过多回合的漏洞选择和奖励反馈,最终给出针对渗透对象的最佳渗透路径。在开源靶场的渗透实验结果表明,与手动测试的渗透路径相比,所提模型推荐的路径具有较高一致性,验证了该模型的可行性与准确性;与自动化渗透测试框架Metasploit相比,该模型在适应所有渗透场景方面也更具灵活性。

关键词: 渗透测试, 强化学习, Q学习, 策略规划

Abstract:

The core problem of penetration test is the planning of penetration test paths. Manual planning relies on the experience of testers, while automated generation of penetration paths is mainly based on the priori knowledge of network security and specific vulnerabilities or network scenarios, which requires high cost and lacks flexibility. To address these problems, a reinforcement learning-based penetration path recommendation model named Q Learning Penetration Test (QLPT) was proposed to finally give the optimal penetration path for the penetration object through multiple rounds of vulnerability selection and reward feedback. It is found that the recommended path of QLPT has a high consistency with the path of manual penetration test by implementing penetration experiments at open source cyber range, verifying the feasibility and accuracy of this model; compared with the automated penetration test framework Metasploit, QLPT is more flexible in adapting to all penetration scenarios.

Key words: penetration test, reinforcement learning, Q learning, strategic planning

中图分类号: