《计算机应用》唯一官方网站

• •    下一篇

基于用户激励的共享单车调度策略研究(CCF Bigdata 2021+139)

石兵1,黄茜子1,宋兆翔2,徐建桥3   

  1. 1. 武汉理工大学计算机与人工智能学院
    2. 武汉理工大学
    3. 中国人民解放军海军工程大学信息安全系
  • 收稿日期:2021-12-15 修回日期:2022-01-18 接受日期:2022-01-24 发布日期:2022-06-08 出版日期:2022-06-08
  • 通讯作者: 黄茜子
  • 基金资助:
    教育部人文社科研究项目;教育部哲学社会科学研究后期资助项目

Users incentive bike-sharing dispatching: A reinforcement learning method(CCF Bigdata 2021+139)

  • Received:2021-12-15 Revised:2022-01-18 Accepted:2022-01-24 Online:2022-06-08 Published:2022-06-08
  • Supported by:
    Humanity and Social Science Youth Research Foundation of Ministry of Education;Philosophy and Social Science Post-Foundation of Ministry of Education

摘要: 针对共享单车的调度问题,考虑预算限制、用户最大步行距离限制、用户时空需求以及共享单车分布动态变化的情况下,提出一种用户激励下的共享单车调度策略,达到提高共享单车平台长期用户服务率的目的。该调度策略包含任务生成算法、预算分配算法和任务分配算法。任务生成算法中,基于LSTM预测用户未来的单车需求量。预算分配算法中,平台要顺序地为各个时段分配预算,这是一个序贯决策问题,因此可建模为马尔科夫决策过程,并采用深度强化学习算法DDPG来设计预算分配策略。在任务分配算法中,由于预算的限制导致无法使用主流的二部图匹配算法,选择使用贪心匹配策略来进行任务分配。最后,基于摩拜单车的数据集进行实验,并分别与无预算限制的调度策略(即平台不受预算限制,可以使用任意金钱激励用户将车骑行至目标区域)、基于贪心的调度策略、卡车拖运下的调度策略以及未进行调度的情况进行对比实验。结果表明用户激励下的共享单车调度策略效果仅次于无预算限制的调度策略,能够为共享单车的调度策提供有意义的指导。

关键词: 共享单车调度, 需求预测, 用户激励, 马尔科夫决策, 深度强化学习

Abstract: Focused on the issue that bike-sharing dispatching, considered budget constraints, restrictions on users' maximum walking distance, users' temporal and spatial needs, and dynamic changes in the distribution of shared bicycles. Devised a bike-sharing dispatching strategy with user participation to improve the long-term user service rate of the platform. The dispatching strategy includes task generation algorithm, budget allocation algorithm and task allocation algorithm. In the task generation algorithm, predicted the user’s future bicycle demand based on LSTM. In the budget allocation algorithm, the sequential allocation of budgets for each time period is a sequential decision-making problem, and thus modeled it as a Markov decision process. At the same time, considering that the problem has a high-dimensional and continuous state space and a continuous action space, designed a budget allocation strategy based on the deep deterministic strategy gradient algorithm DDPG. In the task allocation algorithm, due to the budget constraint that makes it impossible to use the mainstream bipartite graph matching algorithm, used the greedy matching strategy for task allocation. Finally, we run experiments based on the Mobike dataset to evaluate our strategy against the dispatching strategy with unlimited budget, the dispatching strategy with greedy budget allocation, the dispatching strategy under truck hauling, and the situation without dispatching. The results show that our shared bicycle dispatching strategy with user participation can achieve the best results except for the dispatching strategy with unlimited budget. The experimental results can provide some useful insights for dispatching shared bikes.

Key words: Bike-Sharing Dispatching, Demand Prediction, User Incentive, Markov decision, Deep Reinforcement Learning

中图分类号: