Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (6): 1689-1694.DOI: 10.11772/j.issn.1001-9081.2021061424
• National Open Distributed and Parallel Computing Conference 2021 (DPCS 2021) • Previous Articles
Haini ZHAO1,2, Jian JIAO1,2()
Received:
2021-08-09
Revised:
2021-10-16
Accepted:
2021-10-29
Online:
2022-01-10
Published:
2022-06-10
Contact:
Jian JIAO
About author:
ZHAO Haini, born in 1997, M. S. candidate. Her research interests include network security, penetration test.
Supported by:
通讯作者:
焦健
作者简介:
赵海妮(1997—),女,安徽阜阳人,硕士研究生,CCF会员,主要研究方向:网络安全、渗透测试
基金资助:
CLC Number:
Haini ZHAO, Jian JIAO. Recommendation model of penetration path based on reinforcement learning[J]. Journal of Computer Applications, 2022, 42(6): 1689-1694.
赵海妮, 焦健. 基于强化学习的渗透路径推荐模型[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1689-1694.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021061424
符号 | 含义 |
---|---|
V | 可利用漏洞集 |
漏洞i,描述形式为(漏洞标识,漏洞位置,漏洞等级,数据) | |
S | 模型对渗透对象知悉程度状态集 |
t时刻状态,描述形式为(位置,权限,数据) | |
期望状态 | |
A | 动作集 |
t时刻执行的动作,描述形式为 | |
T | 漏洞利用工具 |
Resu | 漏洞利用结果 |
R | 奖励值{0,1} |
∂ | 随机选择概率 |
π | 策略(决策规则) |
γ | 折扣系数 |
状态st 在策略 | |
状态st 在最优策略下利用漏洞vi 的价值 | |
MaxEpisodes | 最大学习回合数 |
Path | 渗透路径,由一系列的漏洞利用动作构成, 即 |
Tab. 1 Main symbols and their definitions
符号 | 含义 |
---|---|
V | 可利用漏洞集 |
漏洞i,描述形式为(漏洞标识,漏洞位置,漏洞等级,数据) | |
S | 模型对渗透对象知悉程度状态集 |
t时刻状态,描述形式为(位置,权限,数据) | |
期望状态 | |
A | 动作集 |
t时刻执行的动作,描述形式为 | |
T | 漏洞利用工具 |
Resu | 漏洞利用结果 |
R | 奖励值{0,1} |
∂ | 随机选择概率 |
π | 策略(决策规则) |
γ | 折扣系数 |
状态st 在策略 | |
状态st 在最优策略下利用漏洞vi 的价值 | |
MaxEpisodes | 最大学习回合数 |
Path | 渗透路径,由一系列的漏洞利用动作构成, 即 |
符号 | 含义 |
---|---|
root | 系统管理员,管理系统设备、系统文件和系统进程等一切资源 |
user | 任意一个系统普通用户,由系统初始化产生或系统管理员创建,有自己独立私有的资源 |
access | 可以访问网络服务的远程访问者,通常是信任的访问者,能和网络服务进程交互数据,可以扫描系统信息等 |
none | 没有任何权限的远程访问者,包括不受信任或被隔离在防火墙之外的用户 |
Tab. 2 System access rights description
符号 | 含义 |
---|---|
root | 系统管理员,管理系统设备、系统文件和系统进程等一切资源 |
user | 任意一个系统普通用户,由系统初始化产生或系统管理员创建,有自己独立私有的资源 |
access | 可以访问网络服务的远程访问者,通常是信任的访问者,能和网络服务进程交互数据,可以扫描系统信息等 |
none | 没有任何权限的远程访问者,包括不受信任或被隔离在防火墙之外的用户 |
QLPT模块 | 功能描述 |
---|---|
漏洞选择 | 根据当前状态 |
漏洞利用 | 根据当前漏洞 |
奖励值转换 | 根据当前状态 |
Q学习 | 根据状态 |
状态-漏洞Q表 | 存储每组状态-漏洞对的价值 |
状态更新 | 根据漏洞利用结果Resu更新状态 |
Tab. 3 Function description of QLPT internal modules
QLPT模块 | 功能描述 |
---|---|
漏洞选择 | 根据当前状态 |
漏洞利用 | 根据当前漏洞 |
奖励值转换 | 根据当前状态 |
Q学习 | 根据状态 |
状态-漏洞Q表 | 存储每组状态-漏洞对的价值 |
状态更新 | 根据漏洞利用结果Resu更新状态 |
漏洞标识 | 漏洞类型 | 漏洞位置 | 漏洞等级 | 数据 |
---|---|---|---|---|
CWE-20 | 文件上传 | Input file | 7.5 | webshell |
CWE-22 | 目录扫描 | ./ | 5.3 | 敏感文件 |
CWE-89 | SQL注入 | ss_id | 10.0 | webshell |
CWE-307 | 暴力破解 | login | 5.3 | 密码列表 |
CWE-79 | XSS | user_email | 5.3 | Cookie |
Tab. 4 Vulnerability information description
漏洞标识 | 漏洞类型 | 漏洞位置 | 漏洞等级 | 数据 |
---|---|---|---|---|
CWE-20 | 文件上传 | Input file | 7.5 | webshell |
CWE-22 | 目录扫描 | ./ | 5.3 | 敏感文件 |
CWE-89 | SQL注入 | ss_id | 10.0 | webshell |
CWE-307 | 暴力破解 | login | 5.3 | 密码列表 |
CWE-79 | XSS | user_email | 5.3 | Cookie |
索引 | 状态 | 位置 | 权限 | 数据 |
---|---|---|---|---|
0 | 得到可利用漏洞集 | web | access | 漏洞信息 |
1 | 已获取后台入口 | web | user | 后台URL |
2 | 已获取用户信息 | web | user | 用户信息 |
3 | 已获取webshell | web | user | 管理员信息 |
4 | 提权成功 | server | root | 全部信息 |
Tab. 5 State set indexes
索引 | 状态 | 位置 | 权限 | 数据 |
---|---|---|---|---|
0 | 得到可利用漏洞集 | web | access | 漏洞信息 |
1 | 已获取后台入口 | web | user | 后台URL |
2 | 已获取用户信息 | web | user | 用户信息 |
3 | 已获取webshell | web | user | 管理员信息 |
4 | 提权成功 | server | root | 全部信息 |
框架/模型 | 攻击场景 普适性 | 反馈更新漏洞 利用条件 | 测试人员无需具备 漏洞利用知识 |
---|---|---|---|
Metasploit | × | × | √ |
QLPT | √ | √ | √ |
Tab. 6 Comparison of QLPT and Metasploit
框架/模型 | 攻击场景 普适性 | 反馈更新漏洞 利用条件 | 测试人员无需具备 漏洞利用知识 |
---|---|---|---|
Metasploit | × | × | √ |
QLPT | √ | √ | √ |
1 | SARRAUTE C. Automated attack planning[D]. Buenos Aires: Instituto Tecnológico de Buenos Aires, 2012:23-24. |
2 | SINGH N, MEHERHOMJI V, CHANDAVARKAR B R. Automated versus manual approach of web application penetration testing[C]// Proceedings of the 11th International Conference on Computing, Communication and Networking Technologies. Piscataway: IEEE, 2020: 1-6. 10.1109/icccnt49239.2020.9225385 |
3 | SHEYNER O, HAINES J, JHA S, et al. Automated generation and analysis of attack graphs[C]// Proceedings 2002 IEEE Symposium on Security and Privacy. Piscataway: IEEE, 2002: 273-284. |
4 | SWILER L P, PHILLIPS C, GAYLOR T. A graph-based network-vulnerability analysis system: SAND-97-3010C; CONF-980534ON: DE98001486; BR: YN0100000; TRN: AHC2DT03%%16[R]. Albuquerque, NM: Sandia National Lab, 1998:8. |
5 | YU X H, JIANG J H, SHUAI C Y. Approach to attack path generation based on vulnerability correlation[M]// KOPCHO J, KURZAWA C, MACPHERSON G. IEEE Conference Anthology. Piscataway: IEEE, 2013: 1-6. 10.1109/anthology.2013.6784925 |
6 | OU X M, GOVINDAVAJHALA S, APPEL A W. MulVAL: a logic-based network security analyzer[C]// Proceedings of the 14th USENIX Security Symposium. Berkeley: USENIX Association, 2005: 113-128. |
7 | OU X M, BOYER W F, McQUEEN M A. A scalable approach to attack graph generation[C]// Proceedings of the 13th ACM Conference on Computer and Communications Security. New York: ACM, 2006: 336-345. 10.1145/1180405.1180446 |
8 | 张登峰. 基于机器学习的SQL注入检测[D]. 重庆:重庆邮电大学, 2017:1-69. |
ZHANG D F. SQL injection detection based on machine learning[D]. Chongqing: Chongqing University of Posts and Telecommunications, 2017:1-69 | |
9 | 洪镇宇. 基于机器学习的跨站脚本攻击检测研究[D]. 厦门:厦门大学, 2018:1-77. |
HONG Z U. Research on detection of cross-site scripting attacks based on machine learning[D]. Xiamen: Xiamen University, 2018:1-77. | |
10 | NUNAN A E, SOUTO E, DOS SANTOS E M, et al. Automatic classification of cross-site scripting in web pages using document-based and URL-based features[C]// Proceedings of the 2012 IEEE Symposium on Computers and Communications. Piscataway: IEEE, 2012: 702-707. 10.1109/iscc.2012.6249380 |
11 | SARRAUTE C, BUFFET O, HOFFMANN J. POMDPs make better hackers: accounting for uncertainty in penetration testing[C]// Proceedings of the 26th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2012: 1816-1824. 10.1609/aaai.v26i1.8363 |
12 | RICHARD S S, BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge: MIT Press, 1998: 313-314. |
13 | CHOWDHARY A, HUANG D J, MAHENDRAN J S, et al. Autonomous security analysis and penetration testing[C]// Proceedings of the 16th International Conference on Mobility, Sensing and Networking. Piscataway: IEEE, 2020: 508-515. 10.1109/msn50589.2020.00086 |
14 | CHAUDHARY S, O’BRIEN A, XU S. Automated post-breach penetration testing through reinforcement learning[C]// Proceedings of the 2020 IEEE Conference on Communications and Network Security. Piscataway: IEEE, 2020: 1-2. 10.1109/cns48642.2020.9162301 |
15 | GHANEM M C, CHEN T M. Reinforcement learning for intelligent penetration testing[C]// Proceedings of the 2nd World Conference on Smart Trends in Systems, Security and Sustainability. Piscataway: IEEE, 2018: 185-192. 10.1109/worlds4.2018.8611595 |
16 | Rapid 7. Metasploit[DB/OL]. [2021-06-17]. . 10.34739/si.2020.24.03 |
17 | ZHOU T Y, ZANG Y C, ZHU J H, et al. NIG-AP: a new method for automated penetration testing[J]. Frontiers of Information Technology and Electronic Engineering, 2019, 20(9): 1277-1288. 10.1631/fitee.1800532 |
18 | Invicti. Acunetix[DB/OL]. [2021-06-17].. 10.37034/jidt.v4i1.190 |
19 | WATKINS C J C H. Learning from delayed rewards[D]. Cambridge: King’s College of University of Cambridge, 1989:1-142. |
20 | WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3/4):279-292. 10.1023/a:1022676722315 |
[1] | Shiquan DENG, Xuguo YE. Multi-objective task offloading algorithm based on deep Q-network [J]. Journal of Computer Applications, 2022, 42(6): 1668-1674. |
[2] | Shaobin DENG, Jun ZHU, Xiaofeng ZHOU, Shuai LI, Shurui LIU. Industrial process control method based on local policy interaction exploration-based deep deterministic policy gradient [J]. Journal of Computer Applications, 2022, 42(5): 1642-1648. |
[3] | Haojie CHEN, Jiangting FAN, Yong LIU. Solving dynamic traveling salesman problem by deep reinforcement learning [J]. Journal of Computer Applications, 2022, 42(4): 1194-1200. |
[4] | Xueming LI, Guohao WU, Shangbo ZHOU, Xiaoran LIN, Hongbin XIE. Image instance segmentation model based on fractional-order network and reinforcement learning [J]. Journal of Computer Applications, 2022, 42(2): 574-583. |
[5] | Bosen ZENG, Yong ZHONG, Xianhua NIU. Q-table initialization approach for safe exploration based on factorization machine [J]. Journal of Computer Applications, 2022, 42(1): 209-214. |
[6] | SHANG Fangjian, LI Xin, Di ZHAI, LU Yang, ZHANG Donglei, QIAN Yuwen. Two-phase resource allocation technology for network slices in smart grid [J]. Journal of Computer Applications, 2021, 41(7): 2033-2038. |
[7] | WANG Jianping, WANG Gang, MAO Xiaobin, MA Enqi. Motion control method of two-link manipulator based on deep reinforcement learning [J]. Journal of Computer Applications, 2021, 41(6): 1799-1804. |
[8] | WANG Yu, LIU Yanli, CHEN Shaowu. Maximum common induced subgraph algorithm based on vertex conflict learning [J]. Journal of Computer Applications, 2021, 41(6): 1756-1760. |
[9] | DU Xixi, CHENG Hua, FANG Yiquan. Reinforced automatic summarization model based on advantage actor-critic algorithm [J]. Journal of Computer Applications, 2021, 41(3): 699-705. |
[10] | YAO Xinghu, TAN Xiaoyang. Reward highway network based global credit assignment algorithm in multi-agent reinforcement learning [J]. Journal of Computer Applications, 2021, 41(1): 1-7. |
[11] | LIU Sijia, TONG Xiangrong. Urban transportation path planning based on reinforcement learning [J]. Journal of Computer Applications, 2021, 41(1): 185-190. |
[12] | FU Kui, LIANG Shaoqing, LI Bing. Commodity recommendation model based on improved deep Q network structure [J]. Journal of Computer Applications, 2020, 40(9): 2613-2621. |
[13] | HU Xuemin, CHENG Yu, CHEN Guowen, ZHANG Ruohan, TONG Xiuchi. Motion planning for autonomous driving with directional navigation based on deep spatio-temporal Q-network [J]. Journal of Computer Applications, 2020, 40(7): 1919-1925. |
[14] | ZHENG Yanbin, FAN Wenxin, HAN Mengyun, TAO Xueli. Multi-agent collaborative pursuit algorithm based on game theory and Q-learning [J]. Journal of Computer Applications, 2020, 40(6): 1613-1620. |
[15] | REN Na, ZHANG Nan, CUI Yan, ZHANG Rongxue, PANG Xinfu. Method of semantic entity construction and trajectory control for UAV electric power inspection [J]. Journal of Computer Applications, 2020, 40(10): 3095-3100. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||