Order dispatching by multi-agent reinforcement learning based on shared attention

doi:10.11772/j.issn.1001-9081.2022040630

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (5): 1620-1624.DOI: 10.11772/j.issn.1001-9081.2022040630

Special Issue: 前沿与综合应用

• Frontier and comprehensive applications • Previous Articles Next Articles

Order dispatching by multi-agent reinforcement learning based on shared attention

Xiaohui HUANG, Kaiming YANG(), Jiahao LING

School of Information Engineering，East China Jiaotong University，Nanchang Jiangxi 330013，China

Received:2022-05-06 Revised:2022-07-11 Accepted:2022-07-13 Online:2022-08-05 Published:2023-05-10
Contact: Kaiming YANG
About author:HUANG Xiaohui， born in 1984， Ph. D.， associate professor. His research interests include deep learning， intelligent transportation.
YANG Kaiming， born in 1996， M. S. candidate. His research interests include deep reinforcement learning， intelligent transportation.
LING Jiahao， born in 1999， M. S. candidate. His research interests include deep reinforcement learning， intelligent transportation.
Supported by:
National Natural Science Foundation of China(62062033);Natural Science Foundation of Jiangxi Province(20212BAB202008)

基于共享注意力的多智能体强化学习订单派送

黄晓辉, 杨凯铭(), 凌嘉壕

华东交通大学信息工程学院，南昌 330013

通讯作者: 杨凯铭
作者简介:黄晓辉（1984—），男，江西上高人，副教授，博士，CCF会员，主要研究方向：深度学习、智慧交通
杨凯铭（1996—），男，江西南昌人，硕士研究生，主要研究方向：深度强化学习、智慧交通 yangkaiming9622@qq.com
凌嘉壕（1999—），男，湖南长沙人，硕士研究生，主要研究方向：深度强化学习、智慧交通。
基金资助:
国家自然科学基金资助项目(62062033);江西省自然科学基金资助项目(20212BAB202008)

Abstract

Abstract:

Ride-hailing has become a popular choice for people to travel due to its convenience and speed， how to efficiently dispatch the appropriate orders to deliver passengers to the destination is a research hotspot today. Many researches focus on training a single agent， which then uniformly distributies orders， without the vehicle itself being involved in the decision making. To solve the above problem， a multi-agent reinforcement learning algorithm based on shared attention， named SARL （Shared Attention Reinforcement Learning）， was proposed. In the algorithm， the order dispatching problem was modeled as a Markov decision process， and multi-agent reinforcement learning was used to make each agent become a decision-maker through centralized training and decentralized execution. Meanwhile， the shared attention mechanism was added to make the agents share information and cooperate with each other. Comparison experiments with Random matching （Random）， Greedy algorithm （Greedy）， Individual Deep-Q-Network （IDQN） and Q-learning MIXing network （QMIX） were conducted under different map scales， different number of passengers and different number of vehicles. Experimental results show that the SARL algorithm achieves optimal time efficiency in three different scale maps （100×100， 10×10 and 500×500） for fixed and variable vehicle and passenger combinations， which verifies the generalization performance and stable performance of the SARL algorithm. The SARL algorithm can optimize the matching of vehicles and passengers， reduce the waiting time of passengers and improve the satisfaction of passengers.

Key words: machine learning, deep reinforcement learning, attention mechanism, multi-agent reinforcement learning, vehicle order dispatching

摘要：

网约车因方便、快捷成为现今人们出行热门之选，如何更高效地派送合适的订单将乘客送到目的地是如今研究的热点。许多研究着重于训练单智能体，再由它统一分配订单，车辆本身并不参与决策。针对以上问题，提出一种基于共享注意力的多智能体强化学习（SARL）算法。该算法将订单派送问题建模为一个马尔可夫决策过程，运用多智能体强化学习，通过集中训练、分散执行的方式让每个智能体均成为决策者；同时加入共享注意力机制，让智能体彼此共享信息并合作。最后，在不同尺度地图、不同乘客数以及不同车辆数情形下与完全随机匹配（Random）、贪婪算法（Greedy）、多智能体强化学习算法IDQN和混合Q值网络（QMIX）进行对比。结果显示，在固定和可变的车辆与乘客组合情况下，SARL算法在三个不同尺度地图（100×100、10×10和500×500）的时间效率均达到了最优，验证了算法的泛化性能和稳定性。SARL算法可以优化车辆和乘客的配对，减少乘客等待时间，提升乘客满意度。

关键词: 机器学习, 深度强化学习, 注意力机制, 多智能体强化学习, 车辆订单派送

CLC Number:

TP18

Xiaohui HUANG, Kaiming YANG, Jiahao LING. Order dispatching by multi-agent reinforcement learning based on shared attention[J]. Journal of Computer Applications, 2023, 43(5): 1620-1624.

黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1620-1624.

Figures/Tables 6

References 20

1	LI Z， LIANG C， HONG Y，et al. How do on-demand ridesharing services affect traffic congestion？ The moderating role of urban compactness ［EB/OL］. ［2022-01-22］. . 10.1111/poms.13530
2	李建斌，杨帆，管梦城，等.共同配送模式下订单车辆匹配决策优化研究［J］.管理工程学报，2021，35（6）：259-266.
	LI J B， YANG F， GUAN M C， et al. Research on optimization of order-vehicle matching decision under the joint distribution mode［J］. Journal of Industrial Engineering and Engineering Management， 2021， 35（6）： 259-266.
3	QIN Z， TANG X， JIAO Y， et al. Ride-hailing order dispatching at DiDi via reinforcement learning［J］. INFORMS Journal on Applied Analytics， 2020， 50（5）： 272-286. 10.1287/inte.2020.1047
4	GAŠPEROV B， KOSTANJČAR Z. Deep reinforcement learning for market making under a Hawkes process-based limit order book model［J］. IEEE Control Systems Letters， 2022， 6： 2485-2490. 10.1109/lcsys.2022.3166446
5	TANG X， HUANG B， LIU T， et al. Highway decision-making and motion planning for autonomous driving via soft actor-critic［J］. IEEE Transactions on Vehicular Technology， 2022， 71（5）： 4706-4717. 10.1109/tvt.2022.3151651
6	王建平，王刚，毛晓彬，等.基于深度强化学习的二连杆机械臂运动控制方法［J］.计算机应用，2021，41（6）：1799-1804. 10.11772/j.issn.1001-9081.2020091410
	WANG J P， WANG G， MAO X B， et al. Motion control method of two-link manipulator based on deep reinforcement learning［J］. Journal of Computer Applications， 2021， 41（6）： 1799-1804. 10.11772/j.issn.1001-9081.2020091410
7	陈浩杰，范江亭，刘勇.深度强化学习解决动态旅行商问题［J］.计算机应用，2022，42（4）：1194-1200. 10.11772/j.issn.1001-9081.2021071253
	CHEN H J， FAN J T， LIU Y. Solving dynamic traveling salesman problem by deep reinforcement learning［J］. Journal of Computer Applications， 2022， 42（4）： 1194-1200. 10.11772/j.issn.1001-9081.2021071253
8	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518（7540）： 529-533. 10.1038/nature14236
9	RASHID T， SAMVELYAN M， DE WITT C S， et al. QMIX： monotonic value function factorisation for deep multi-agent reinforcement learning［C］// Proceedings of the 35th International Conference on Machine Learning. New York： JMLR.org， 2018： 4295-4304. 10.48550/arXiv.1803.11485
10	DE LIMA O， SHAH H， CHU T S， et al. Efficient ridesharing dispatch using multi-agent reinforcement learning［EB/OL］. ［2022-03-27］..
11	PAN L， CAI Q， FANG Z， et al. A deep reinforcement learning framework for rebalancing dockless bike sharing systems［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2019： 1393-1400. 10.1609/aaai.v33i01.33011393
12	TANG X， QIN Z， ZHANG F， et al. A deep value-network based approach for multi-driver order dispatching［C］// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2019： 1780-1790. 10.1145/3292500.3330724
13	WANG Z， QIN Z， TANG X， et al. Deep reinforcement learning with knowledge transfer for online rides order dispatching［C］// Proceedings of the 2018 IEEE International Conference on Data Mining. Piscataway： IEEE， 2018： 617-626. 10.1109/icdm.2018.00077
14	VAN HASSELT H， GUEZ A， SILVER D. Deep reinforcement learning with double Q-learning［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2016： 2094-2100. 10.1609/aaai.v30i1.10295
15	CHILUKURI S， PESCH D. RECCE： deep reinforcement learning for joint routing and scheduling in time-constrained wireless networks［J］. IEEE Access， 2021， 9： 132053-132063. 10.1109/access.2021.3114967
16	SON K， KIM D， KANG W J， et al. QTRAN： learning to factorize with transformation for cooperative multi-agent reinforcement learning［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 5887-5896. 10.48550/arXiv.1905.05408
17	CUI H， ZHANG Z. A cooperative multi-agent reinforcement learning method based on coordination degree［J］. IEEE Access， 2021， 9： 123805-123814. 10.1109/access.2021.3110255
18	LIU B， LIU Q， STONE P， et al. Coach-player multi-agent reinforcement learning for dynamic team composition［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 6860-6870. 10.48550/arXiv.2105.08692
19	LUO M， ZHANG W， SONG T， et al. Rebalancing expanding EV sharing systems with deep reinforcement learning［C］// Proceedings of the 29th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2020： 1338-1344. 10.24963/ijcai.2020/186
20	ZHOU M， JIN J， ZHANG W， et al. Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching［C］// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York： ACM， 2019： 2645-2653. 10.1145/3357384.3357799

地图尺寸	车乘组合	时长/s					提升率/%
地图尺寸	车乘组合	Random	Greedy	DQN	QMIX	SARL	提升率/%
100×100 （训练模型）	P=7， C=2	3 386.25	3 526.96	3 306.88	3 218.54	2 981.38	7.37
	P=10， C=10	2 210.87	2 208.55	2 102.65	2 042.44	1 912.15	6.34
	P=11， C=13	2 089.87	2 089.63	2 046.65	1 951.38	1 742.73	10.71
	P=9， C=4	2 958.86	3 072.81	2 763.20	2 724.03	2 523.49	7.38
	P=10， C=2	4 644.59	4 934.91	4 847.97	4 357.72	4 214.86	3.28
	P=25，C=20	2 962.79	3 173.54	2 853.66	2 573.24	2 109.84	18.03
10×10	P=7， C=2	337.30	348.70	323.90	316.50	295.38	6.67
	P=10， C=10	215.64	209.07	201.28	206.46	179.10	13.25
	P=11， C=13	208.77	199.75	197.53	197.32	181.11	8.22
	P=9， C=4	287.57	303.86	262.40	265.10	247.01	6.82
	P=10， C=2	448.38	474.44	454.27	417.91	392.16	6.16
	P=25， C=20	291.62	287.84	285.62	283.06	230.93	18.42
500×500	P=7， C=2	17 092.40	17 251.20	16 473.10	16 274.30	14 916.60	8.34
	P=10， C=10	10 860.21	10 720.60	10 139.36	10 120.57	10 021.45	0.98
	P=11， C=13	10 428.24	10 950.80	9 968.44	9 835.75	9 098.43	7.50
	P=9， C=4	14 715.82	15 582.50	13 571.88	13 548.92	12 182.71	10.08
	P=10， C=2	23 303.64	24 491.40	23 688.50	21 871.99	20 910.92	4.39
	P=25， C=20	14 820.33	16 046.20	14 649.64	12 784.23	11 902.55	6.89

地图尺寸	车乘组合	时长/s					提升率/%
地图尺寸	车乘组合	Random	Greedy	DQN	QMIX	SARL	提升率/%
100×100 （训练模型）	P=7， C=2	3 386.25	3 526.96	3 306.88	3 218.54	2 981.38	7.37
	P=10， C=10	2 210.87	2 208.55	2 102.65	2 042.44	1 912.15	6.34
	P=11， C=13	2 089.87	2 089.63	2 046.65	1 951.38	1 742.73	10.71
	P=9， C=4	2 958.86	3 072.81	2 763.20	2 724.03	2 523.49	7.38
	P=10， C=2	4 644.59	4 934.91	4 847.97	4 357.72	4 214.86	3.28
	P=25，C=20	2 962.79	3 173.54	2 853.66	2 573.24	2 109.84	18.03
10×10	P=7， C=2	337.30	348.70	323.90	316.50	295.38	6.67
	P=10， C=10	215.64	209.07	201.28	206.46	179.10	13.25
	P=11， C=13	208.77	199.75	197.53	197.32	181.11	8.22
	P=9， C=4	287.57	303.86	262.40	265.10	247.01	6.82
	P=10， C=2	448.38	474.44	454.27	417.91	392.16	6.16
	P=25， C=20	291.62	287.84	285.62	283.06	230.93	18.42
500×500	P=7， C=2	17 092.40	17 251.20	16 473.10	16 274.30	14 916.60	8.34
	P=10， C=10	10 860.21	10 720.60	10 139.36	10 120.57	10 021.45	0.98
	P=11， C=13	10 428.24	10 950.80	9 968.44	9 835.75	9 098.43	7.50
	P=9， C=4	14 715.82	15 582.50	13 571.88	13 548.92	12 182.71	10.08
	P=10， C=2	23 303.64	24 491.40	23 688.50	21 871.99	20 910.92	4.39
	P=25， C=20	14 820.33	16 046.20	14 649.64	12 784.23	11 902.55	6.89

方法	10×10网格（P_max=10，C_max=10）	500×500网格（P_max =20， C_max =20）
提升率/%	6.28	1.24
Random	209.03	13 700.14
Greedy	201.85	13 871.07
DQN	199.43	13 462.82
QMIX	195.66	12 812.79
SARL	183.36	12 653.74

方法	10×10网格（P_max=10，C_max=10）	500×500网格（P_max =20， C_max =20）
提升率/%	6.28	1.24
Random	209.03	13 700.14
Greedy	201.85	13 871.07
DQN	199.43	13 462.82
QMIX	195.66	12 812.79
SARL	183.36	12 653.74

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[4]	Yi ZHOU, Hua GAO, Yongshen TIAN. Proximal policy optimization algorithm based on clipping optimization and policy guidance [J]. Journal of Computer Applications, 2024, 44(8): 2334-2341.
[5]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[6]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[7]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[8]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[9]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[10]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[11]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[12]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[13]	Tian MA, Runtao XI, Jiahao LYU, Yijie ZENG, Jiayi YANG, Jiehui ZHANG. Mobile robot 3D space path planning method based on deep reinforcement learning [J]. Journal of Computer Applications, 2024, 44(7): 2055-2064.
[14]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[15]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.

Order dispatching by multi-agent reinforcement learning based on shared attention

基于共享注意力的多智能体强化学习订单派送

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 20

Related Articles 15

Recommended Articles

Metrics