计算机应用 ›› 2019, Vol. 39 ›› Issue (10): 2876-2882.DOI: 10.11772/j.issn.1001-9081.2019030507

• 人工智能 • 上一篇    下一篇

基于深度Q网络的人群疏散机器人运动规划算法

周婉, 胡学敏, 史晨寅, 魏洁玲, 童秀迟   

  1. 湖北大学 计算机与信息工程学院, 武汉 430062
  • 收稿日期:2019-03-27 修回日期:2019-05-28 出版日期:2019-10-10 发布日期:2018-04-11
  • 通讯作者: 胡学敏
  • 作者简介:周婉(1997-),女,湖北襄阳人,主要研究方向:深度强化学习;胡学敏(1985-),男,湖南岳阳人,副教授,博士,主要研究方向:机器学习、运动规划;史晨寅(1999-),女,湖北十堰人,主要研究方向:运动规划;魏洁玲(1997-),女,湖北武汉人,主要研究方向:深度学习;童秀迟(1996-),女,湖北随州人,硕士研究生,主要研究方向:机器学习。
  • 基金资助:
    国家自然科学基金资助项目(61806076);湖北省自然科学基金资助项目(2018CFB158);湖北省大学生创新创业训练计划项目(201810512055)。

Motion planning algorithm of robot for crowd evacuation based on deep Q-network

ZHOU Wan, HU Xuemin, SHI Chenyin, WEI Jieling, TONG Xiuchi   

  1. School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China
  • Received:2019-03-27 Revised:2019-05-28 Online:2019-10-10 Published:2018-04-11
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61806076), the Natural Science Foundation of Hubei Province (2018CFB158), the Undergraduate Innovation Training Foundation of Hubei Province (201810512055).

摘要: 针对公共场合密集人群在紧急情况下疏散的危险性和效果不理想的问题,提出一种基于深度Q网络(DQN)的人群疏散机器人的运动规划算法。首先通过在原始的社会力模型中加入人机作用力构建出人机社会力模型,从而利用机器人对行人的作用力来影响人群的运动状态;然后基于DQN设计机器人运动规划算法,将原始行人运动状态的图像输入该网络并输出机器人的运动行为,在这个过程中将设计的奖励函数反馈给网络使机器人能够在"环境-行为-奖励"的闭环过程中自主学习;最后经过多次迭代,机器人能够学习在不同初始位置下的最优运动策略,最大限度地提高总疏散人数。在构建的仿真环境里对算法进行训练和评估。实验结果表明,与无机器人的人群疏散算法相比,基于DQN的人群疏散机器人运动规划算法使机器人在三种不同初始位置下将人群疏散效率分别增加了16.41%、10.69%和21.76%,说明该算法能够明显提高单位时间内人群疏散的数量,具有灵活性和有效性。

关键词: 深度Q网络, 人群疏散, 运动规划, 人机社会力模型, 强化学习

Abstract: Aiming at the danger and unsatisfactory effect of dense crowd evacuation in public places in emergency, a motion planning algorithm of robots for crowd evacuation based on Deep Q-Network (DQN) was proposed. Firstly, a human-robot social force model was constructed by adding human-robot interaction to the original social force model, so that the motion state of crowd was able to be influenced by the robot force on pedestrians. Then, a motion planning algorithm of robot was designed based on DQN. The images of the original pedestrian motion state were input into the network and the robot motion behavior was output. In this process, the designed reward function was fed back to the network to enable the robot to autonomously learn from the closed-loop process of "environment-behavior-reward". Finally, the robot was able to learn the optimal motion strategies at different initial positions to maximize the total number of people evacuated after many iterations. The proposed algorithm was trained and evaluated in the simulated environment. Experimental results show that the proposed algorithm based on DQN increases the evacuation efficiency by 16.41%, 10.69% and 21.76% respectively at three different initial positions compared with the crowd evacuation algorithm without robot, which proves that the algorithm can significantly increase the number of people evacuated per unit time with flexibility and effectiveness.

Key words: Deep Q-Network (DQN), crowd evacuation, motion planning, human-robot social force model, reinforcement learning

中图分类号: