Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (8): 2361-2368.DOI: 10.11772/j.issn.1001-9081.2021061012
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Received:
2021-06-10
Revised:
2021-10-13
Accepted:
2021-10-29
Online:
2022-01-25
Published:
2022-08-10
Contact:
Fei LUO
About author:
LUO Fei, born in 1978, Ph. D., associate professor. His research interests include cognitive computing, reinforcement learning.Supported by:
通讯作者:
罗飞
作者简介:
罗飞(1978—),男,湖北武汉人,副教授,博士,CCF会员,主要研究方向:认知计算、强化学习;基金资助:
CLC Number:
Fei LUO, Mengwei BAI. Decision optimization of traffic scenario problem based on reinforcement learning[J]. Journal of Computer Applications, 2022, 42(8): 2361-2368.
罗飞, 白梦伟. 基于强化学习的交通情景问题决策优化[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2361-2368.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021061012
动作值 | 含义 |
---|---|
0 | 出租车向北行驶 |
1 | 出租车向南行驶 |
2 | 出租车向东行驶 |
3 | 出租车向西行驶 |
4 | 出租车载客操作 |
5 | 出租车卸客操作 |
Tab. 1 Definition of taxi action space
动作值 | 含义 |
---|---|
0 | 出租车向北行驶 |
1 | 出租车向南行驶 |
2 | 出租车向东行驶 |
3 | 出租车向西行驶 |
4 | 出租车载客操作 |
5 | 出租车卸客操作 |
奖励项 | 奖励值 |
---|---|
每个时间步 | -1 |
完成任务 | 20 |
错误载客 | -10 |
错误卸客 | -10 |
Tab. 2 Definition of taxi reward function
奖励项 | 奖励值 |
---|---|
每个时间步 | -1 |
完成任务 | 20 |
错误载客 | -10 |
错误卸客 | -10 |
参数 | 值 |
---|---|
学习率 | 0.6 |
衰减因子 | 0.9 |
迭代回合 | 600 |
Tab. 3 Hyperparameter setting for taxis
参数 | 值 |
---|---|
学习率 | 0.6 |
衰减因子 | 0.9 |
迭代回合 | 600 |
参数 | 值 |
---|---|
每条公路长度 | 300 m |
车辆起始停放长度 | 10 m |
仿真时间 | 600 s |
每分钟东西方向车辆数 | 40 |
每分钟南北方向车辆数 | 4 |
车辆长度 | 5 m |
车辆最小间距 | 2 m |
最大车速 | 16.67 m/s |
Tab. 4 Traffic environmental parameters
参数 | 值 |
---|---|
每条公路长度 | 300 m |
车辆起始停放长度 | 10 m |
仿真时间 | 600 s |
每分钟东西方向车辆数 | 40 |
每分钟南北方向车辆数 | 4 |
车辆长度 | 5 m |
车辆最小间距 | 2 m |
最大车速 | 16.67 m/s |
动作值 | 对应动作内容 |
---|---|
0 | GGGGRRGRRGRR |
1 | GGRGRGGRRGRR |
2 | GGRGRRGGRGRR |
3 | GRGGRRGRGGRR |
4 | GRGGRRGRRGGR |
5 | GRRGGGGRRGRR |
6 | GRRGGRGRGGRR |
7 | GRRGGRGRRGGR |
8 | GRRGRGGRRGRG |
9 | GRRGRRGGGGRR |
10 | GRRGRRGGRGRG |
11 | GRRGRRGRRGGG |
Tab. 5 Definition of traffic signal action space
动作值 | 对应动作内容 |
---|---|
0 | GGGGRRGRRGRR |
1 | GGRGRGGRRGRR |
2 | GGRGRRGGRGRR |
3 | GRGGRRGRGGRR |
4 | GRGGRRGRRGGR |
5 | GRRGGGGRRGRR |
6 | GRRGGRGRGGRR |
7 | GRRGGRGRRGGR |
8 | GRRGRGGRRGRG |
9 | GRRGRRGGGGRR |
10 | GRRGRRGGRGRG |
11 | GRRGRRGRRGGG |
奖励项 | 奖励值 |
---|---|
当前车道上车辆平均等待时间 | - |
当前发生碰撞情况的车辆个数 | - |
每个时间步 | -0.1 |
当前发生到达终点的车辆数 | 0.1 |
信号灯发生改变 | -0.1 |
Tab. 6 Definition of reward function for traffic signal control
奖励项 | 奖励值 |
---|---|
当前车道上车辆平均等待时间 | - |
当前发生碰撞情况的车辆个数 | - |
每个时间步 | -0.1 |
当前发生到达终点的车辆数 | 0.1 |
信号灯发生改变 | -0.1 |
强化学习算法 | 得到的累积奖励 |
---|---|
Q-Learning | -95.248 |
SQL | -95.248 |
GSQL | -27.592 |
Dyna-Q | -32.144 |
GSQL-DSEP | -27.392 |
Tab. 7 Final cumulative rewards of traffic signal control obtained by different algorithms
强化学习算法 | 得到的累积奖励 |
---|---|
Q-Learning | -95.248 |
SQL | -95.248 |
GSQL | -27.592 |
Dyna-Q | -32.144 |
GSQL-DSEP | -27.392 |
强化学习算法 | 车辆总等待时间 |
---|---|
Q-Learning | 2 769 |
SQL | 2 769 |
GSQL | 1 344 |
Dyna-Q | 1 392 |
GSQL-DSEP | 1 344 |
Tab. 8 Total waiting time of vehicles obtained by different algorithms
强化学习算法 | 车辆总等待时间 |
---|---|
Q-Learning | 2 769 |
SQL | 2 769 |
GSQL | 1 344 |
Dyna-Q | 1 392 |
GSQL-DSEP | 1 344 |
1 | RINGHAND M, MARK V. Effect of complex traffic situations on route choice behaviour and driver stress in residential areas[J]. Transportation Research Part F: Traffic Psychology and Behaviour, 2019, 60: 274-287. 10.1016/j.trf.2018.10.023 |
2 | LI Y, ZHANG H Y, ZHU H Z, et al. IBAS: index based A-star[J]. IEEE Access, 2018, 6: 11707-11715. 10.1109/access.2018.2808407 |
3 | ZHANG Y, TANG J F, LV S M, et al. Floyd-A∗ algorithm solving the least-time itinerary planning problem in urban scheduled public transport network[J]. Mathematical Problems in Engineering, 2014, 2014: No.185383. 10.1155/2014/185383 |
4 | YU Z Q, YU X H, KOUDAS N, et al. Distributed processing of k shortest path queries over dynamic road networks [C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2020: 665-679. 10.1145/3318464.3389735 |
5 | HART P E, NILSSON N J, RAPHAEL B. A formal basis for the heuristic determination of minimum cost paths[J]. IEEE Transactions on Systems Science and Cybernetics, 1968, 4(2): 100-107. 10.1109/tssc.1968.300136 |
6 | FLOYD R W. Algorithm 97: shortest path[J]. Communications of the ACM, 1962, 5(6): 345-345. 10.1145/367766.368168 |
7 | HOFFMAN W, PAVLEY R. A method for the solution of the nth best path problem[J]. Journal of the ACM, 1959, 6(4): 506-514. 10.1145/320998.321004 |
8 | 刘思嘉,童向荣.基于强化学习的城市交通路径规划[J].计算机应用, 2021, 41(1): 185-190. 10.11772/j.issn.1001-9081.2020060949 |
LIU S J, TONG X R. Urban transportation path planning based on reinforcement learning[J]. Journal of Computer Applications, 2021, 41(1): 185-190. 10.11772/j.issn.1001-9081.2020060949 | |
9 | COOLS S B, GERSHENSON C, D’HOOGHE B. Self-organizing traffic lights: a realistic simulation[M]// PROKOPENKO M. Advances in Applied Self-Organizing Systems. London: Springer, 2013: 45-55. 10.1007/978-1-4471-5113-5_3 |
10 | VARAIYA P. The max-pressure controller for arbitrary networks of signalized intersections[M]// UKKUSURI S V, OZBAY K. Advances in Dynamic Network Modeling in Complex Transportation Systems. New York: Springer, 2013: 27-66. 10.1007/978-1-4614-6243-9_2 |
11 | MAITI N, CHILUKURI B R. Traffic signal control for an isolated intersection using reinforcement learning [C]// Proceedings of the 2021 International Conference on Communication Systems and Networks. Piscataway: IEEE, 2021: 629-633. 10.1109/comsnets51098.2021.9352834 |
12 | SUTTON R S. Dyna, an integrated architecture for learning, planning, and reacting[J]. ACM SIGART Bulletin, 1991, 2(4): 160-163. 10.1145/122344.122377 |
13 | 封硕,舒红,谢步庆.基于改进深度强化学习的三维环境路径规划[J].计算机应用与软件, 2021, 38(1): 250-255. 10.3969/j.issn.1000-386x.2021.01.042 |
FENG S, SHU H, XIE B Q. 3D environment path planning based on improved deep reinforcement learning[J]. Computer Applications and Software, 2021, 38(1): 250-255. 10.3969/j.issn.1000-386x.2021.01.042 | |
14 | EL-TANTAWY S, ABDULHAI B, ABDELGAWAD H. MultiAgent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): methodology and large-scale application on downtown Toronto[J]. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(3): 1140-1150. 10.1109/tits.2013.2255286 |
15 | CHEN C C, WEI H, XU N, et al. Toward a thousand lights: decentralized deep reinforcement learning for large-scale traffic signal control [C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 3414-3421. 10.1609/aaai.v34i04.5744 |
16 | SUTTON R S, BARTO A G. Reinforcement Learning: an Introduction[M]. 2nd ed. Cambridge: MIT Press, 2018: 23-194. |
17 | AZAR M G, MUNOS R, GHAVAMZADEH M, et al. Speedy Q-learning [C]// Proceedings of the 24th International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2011: 2411-2419. |
18 | KAMANCHI C, DIDDIGI R B, BHATNAGAR S. Successive over-relaxation Q-learning[J]. IEEE Control Systems Letters, 2020, 4(1): 55-60. 10.1109/lcsys.2019.2921158 |
19 | JOHN I, KAMANCHI C, BHATNAGAR S. Generalized speedy Q-learning[J]. IEEE Control Systems Letters, 2020, 4(3): 524-529. 10.1109/lcsys.2020.2970555 |
20 | DIETTERICH T G. The MAXQ method for hierarchical reinforcement learning [C]// Proceedings of the 15th International Conference on Machine Learning, San Francisco: Morgan Kaufmann Publishers Inc., 1998: 118-126. 10.1613/jair.639 |
[1] | Hailin XIAO, Tianyi HUANG, Qiuxiang DAI, Yuejun ZHANG, Zhongshan ZHANG. Safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction [J]. Journal of Computer Applications, 2024, 44(9): 2958-2963. |
[2] | Haodong HE, Hao FU, Qiang WANG, Shuai ZHOU, Wei LIU. Multi-robot path following and formation based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(8): 2626-2633. |
[3] | Yi ZHOU, Hua GAO, Yongshen TIAN. Proximal policy optimization algorithm based on clipping optimization and policy guidance [J]. Journal of Computer Applications, 2024, 44(8): 2334-2341. |
[4] | Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG. Mobile robot 3D space path planning method based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(7): 2055-2064. |
[5] | Xiaoyan ZHAO, Wei HAN, Junna ZHANG, Peiyan YUAN. Collaborative offloading strategy in internet of vehicles based on asynchronous deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(5): 1501-1510. |
[6] | Rui TANG, Chuanlin PANG, Ruizhi ZHANG, Chuan LIU, Shibo YUE. DDPG-based resource allocation in D2D communication-empowered cellular network [J]. Journal of Computer Applications, 2024, 44(5): 1562-1569. |
[7] | Fatang CHEN, Miao HUANG, Yufeng JIN. Resource allocation algorithm for low earth orbit satellites oriented to user demand [J]. Journal of Computer Applications, 2024, 44(4): 1242-1247. |
[8] | Xintong QIN, Zhengyu SONG, Tianwei HOU, Feiyue WANG, Xin SUN, Wei LI. Channel access and resource allocation algorithm for adaptive p-persistent mobile ad hoc network [J]. Journal of Computer Applications, 2024, 44(3): 863-868. |
[9] | Ziyang SONG, Junhuai LI, Huaijun WANG, Xin SU, Lei YU. Path planning algorithm of manipulator based on path imitation and SAC reinforcement learning [J]. Journal of Computer Applications, 2024, 44(2): 439-444. |
[10] | Yuanchao LI, Chongben TAO, Chen WANG. Gait control method based on maximum entropy deep reinforcement learning for biped robot [J]. Journal of Computer Applications, 2024, 44(2): 445-451. |
[11] | Fuqin DENG, Huifeng GUAN, Chaoen TAN, Lanhui FU, Hongmin WANG, Tinlun LAM, Jianmin ZHANG. Multi-robot reinforcement learning path planning method based on request-response communication mechanism and local attention mechanism [J]. Journal of Computer Applications, 2024, 44(2): 432-438. |
[12] | Jiachen YU, Ye YANG. Irregular object grasping by soft robotic arm based on clipped proximal policy optimization algorithm [J]. Journal of Computer Applications, 2024, 44(11): 3629-3638. |
[13] | Yu WANG, Zhihui GUAN, Yuanpeng LI. Distributed UAV cluster pursuit decision-making based on trajectory prediction and MADDPG [J]. Journal of Computer Applications, 2024, 44(11): 3623-3628. |
[14] | Jie LONG, Liang XIE, Haijiao XU. Integrated deep reinforcement learning portfolio model [J]. Journal of Computer Applications, 2024, 44(1): 300-310. |
[15] | Yu WANG, Tianjun REN, Zilin FAN. Air combat maneuver decision-making of unmanned aerial vehicle based on guided Minimax-DDQN [J]. Journal of Computer Applications, 2023, 43(8): 2636-2643. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||