基于深度Q网络的人群疏散机器人运动规划算法

doi:10.11772/j.issn.1001-9081.2019030507

计算机应用 ›› 2019, Vol. 39 ›› Issue (10): 2876-2882.DOI: 10.11772/j.issn.1001-9081.2019030507

基于深度Q网络的人群疏散机器人运动规划算法

周婉, 胡学敏, 史晨寅, 魏洁玲, 童秀迟

湖北大学计算机与信息工程学院, 武汉 430062

收稿日期:2019-03-27 修回日期:2019-05-28 出版日期:2019-10-10 发布日期:2018-04-11
通讯作者: 胡学敏
作者简介:周婉(1997-),女,湖北襄阳人,主要研究方向:深度强化学习;胡学敏(1985-),男,湖南岳阳人,副教授,博士,主要研究方向:机器学习、运动规划;史晨寅(1999-),女,湖北十堰人,主要研究方向:运动规划;魏洁玲(1997-),女,湖北武汉人,主要研究方向:深度学习;童秀迟(1996-),女,湖北随州人,硕士研究生,主要研究方向:机器学习。
基金资助:
国家自然科学基金资助项目（61806076）；湖北省自然科学基金资助项目（2018CFB158）；湖北省大学生创新创业训练计划项目（201810512055）。

Motion planning algorithm of robot for crowd evacuation based on deep Q-network

ZHOU Wan, HU Xuemin, SHI Chenyin, WEI Jieling, TONG Xiuchi

School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China

Received:2019-03-27 Revised:2019-05-28 Online:2019-10-10 Published:2018-04-11
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61806076), the Natural Science Foundation of Hubei Province (2018CFB158), the Undergraduate Innovation Training Foundation of Hubei Province (201810512055).

摘要/Abstract

摘要： 针对公共场合密集人群在紧急情况下疏散的危险性和效果不理想的问题，提出一种基于深度Q网络（DQN）的人群疏散机器人的运动规划算法。首先通过在原始的社会力模型中加入人机作用力构建出人机社会力模型，从而利用机器人对行人的作用力来影响人群的运动状态；然后基于DQN设计机器人运动规划算法，将原始行人运动状态的图像输入该网络并输出机器人的运动行为，在这个过程中将设计的奖励函数反馈给网络使机器人能够在"环境-行为-奖励"的闭环过程中自主学习；最后经过多次迭代，机器人能够学习在不同初始位置下的最优运动策略，最大限度地提高总疏散人数。在构建的仿真环境里对算法进行训练和评估。实验结果表明，与无机器人的人群疏散算法相比，基于DQN的人群疏散机器人运动规划算法使机器人在三种不同初始位置下将人群疏散效率分别增加了16.41%、10.69%和21.76%，说明该算法能够明显提高单位时间内人群疏散的数量，具有灵活性和有效性。

关键词: 深度Q网络, 人群疏散, 运动规划, 人机社会力模型, 强化学习

Abstract: Aiming at the danger and unsatisfactory effect of dense crowd evacuation in public places in emergency, a motion planning algorithm of robots for crowd evacuation based on Deep Q-Network (DQN) was proposed. Firstly, a human-robot social force model was constructed by adding human-robot interaction to the original social force model, so that the motion state of crowd was able to be influenced by the robot force on pedestrians. Then, a motion planning algorithm of robot was designed based on DQN. The images of the original pedestrian motion state were input into the network and the robot motion behavior was output. In this process, the designed reward function was fed back to the network to enable the robot to autonomously learn from the closed-loop process of "environment-behavior-reward". Finally, the robot was able to learn the optimal motion strategies at different initial positions to maximize the total number of people evacuated after many iterations. The proposed algorithm was trained and evaluated in the simulated environment. Experimental results show that the proposed algorithm based on DQN increases the evacuation efficiency by 16.41%, 10.69% and 21.76% respectively at three different initial positions compared with the crowd evacuation algorithm without robot, which proves that the algorithm can significantly increase the number of people evacuated per unit time with flexibility and effectiveness.

Key words: Deep Q-Network (DQN), crowd evacuation, motion planning, human-robot social force model, reinforcement learning

中图分类号:

TP391.7

周婉, 胡学敏, 史晨寅, 魏洁玲, 童秀迟. 基于深度Q网络的人群疏散机器人运动规划算法[J]. 计算机应用, 2019, 39(10): 2876-2882.

ZHOU Wan, HU Xuemin, SHI Chenyin, WEI Jieling, TONG Xiuchi. Motion planning algorithm of robot for crowd evacuation based on deep Q-network[J]. Journal of Computer Applications, 2019, 39(10): 2876-2882.

参考文献

[1] HELBING D, MOLNÁR P. Social force model for pedestrian dynamics[J]. Physical Review E:Statistical Physics, Plasmas, Fluids & Related Interdisciplinary Topics, 1995, 51(5):4282-4286.
[2] ROBINETTE P, VELA P A, HOWARD A M. Information propagation applied to robot-assisted evacuation[C]//Proceedings of the 2012 IEEE International Conference on Robotics and Automation. Piscataway:IEEE, 2012:856-861.
[3] BOUKAS E, KOSTAVELIS I, GASTERATOS A, et al. Robot guided crowd evacuation[J]. IEEE Transactions on Automation Science and Engineering, 2015, 12(2):739-751.
[4] POLYDOROS A S, NALPANTIDIS L. Survey of model-based reinforcement learning:applications on robots[J]. Journal of Intelligent and Robotic Systems, 2017, 86(2):153-173.
[5] MNIH V, KAVUKCUOGLU K, SLIVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[6] MNIH V, KAVUKCUOGLU K, SLIVER D, et al. Play Atari with deep reinforcement learning[EB/OL].[2018-12-10]. http://export.arxiv.org/pdf/1312.5602.
[7] HWANG K, JIANG W, CHEN Y. Pheromone-based planning strategies in Dyna-Q learning[J]. IEEE Transactions on Industrial Informatics, 2017, 13(2):424-435.
[8] IMANBERDIYEV N, FU C, KAYACAN E, et al. Autonomous navigation of UAV by using real-time model-based reinforcement learning[C]//Proceedings of the 14th International Conference on Control, Automation, Robotics and Vision. Piscataway:IEEE, 2016:1-6.
[9] GIUSTI A, GUZZI J, CIRESAN D C, et al. A machine learning approach to visual perception of forest trails for mobile robots[J]. IEEE Robotics and Automation Letters, 2016, 1(2):661-667.
[10] SU M C, HUANG D, CHOW C, et al. A reinforcement learning approach to robot navigation[C]//Proceedings of the 2004 International Conference on Networking, Sensing and Control. Piscataway:IEEE, 2004:665-669.
[11] 胡学敏, 徐珊珊, 康美玉, 等. 基于人机社会力模型的人群疏散算法[J]. 计算机应用, 2018, 38(8):2165-2166. (HU X M, XU S S, KANG M Y, et al. Crowd evacuation based on human-robot social force model[J]. Journal of Computer Applications, 2018, 38(8):2165-2166.)
[12] XIE L H, WANG S, MARKHAM A, et al. Towards monocular vision based obstacle avoidance through deep reinforcement learning[EB/OL].[2018-12-10]. https://arxiv.org/pdf/1706.09829.pdf.
[13] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL].[2019-01-10]. https://arxiv.org/pdf/1509.02971.pdf.
[14] CUENCA Á, OJHA U, SALT J, et al. A non-uniform multi-rate control strategy for a Markov chain driven networked control system[J]. Information Sciences, 2015, 321:31-47.
[15] 赵玉婷, 韩宝玲, 罗庆生. 基于deep Q-network双足机器人非平整地面行走稳定性控制方法[J]. 计算机应用, 2018, 38(9):2459-2463. (ZHAO Y T, HAN B L, LUO Q S. Walking stability control method based on deep Q-network for biped robot on uneven ground[J]. Journal of Computer Applications, 2018, 38(9):2459-2463.)
[16] CHEN Y, LIU M, EVERETT M, et al. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning[C]//Proceedings of the 2007 IEEE International Conference on Robotics and Automation. Piscataway:IEEE, 2017:285-292.
[17] CHEN D, VARSHNEY P K. A survey of void handling techniques or geographic routing in wireless network[J]. IEEE Communications Surveys and Tutorials, 2007, 9(1):50-67.

基于深度Q网络的人群疏散机器人运动规划算法

Motion planning algorithm of robot for crowd evacuation based on deep Q-network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王建平, 王刚, 毛晓彬, 马恩琪. 基于深度强化学习的二连杆机械臂运动控制方法[J]. 计算机应用, 2021, 41(6): 1799-1804.
[2]	王宇, 刘燕丽, 陈劭武. 基于顶点冲突学习的最大公共子图算法[J]. 计算机应用, 2021, 41(6): 1756-1760.
[3]	杜嘻嘻, 程华, 房一泉. 基于优势演员-评论家算法的强化自动摘要模型[J]. 计算机应用, 2021, 41(3): 699-705.
[4]	刘思嘉, 童向荣. 基于强化学习的城市交通路径规划[J]. 计算机应用, 2021, 41(1): 185-190.
[5]	姚兴虎, 谭晓阳. 基于奖励高速路网络的多智能体强化学习中的全局信用分配算法[J]. 计算机应用, 2021, 41(1): 1-7.
[6]	傅魁, 梁少晴, 李冰. 基于改进的深度Q网络结构的商品推荐模型[J]. 计算机应用, 2020, 40(9): 2613-2621.
[7]	胡学敏, 成煜, 陈国文, 张若晗, 童秀迟. 基于深度时空Q网络的定向导航自动驾驶运动规划[J]. 计算机应用, 2020, 40(7): 1919-1925.
[8]	郑延斌, 樊文鑫, 韩梦云, 陶雪丽. 基于博弈论及Q学习的多Agent协作追捕算法[J]. 计算机应用, 2020, 40(6): 1613-1620.
[9]	李克讷, 张增, 王温鑫. 基于伪逆的导轨机械臂关节速度纠偏运动规划方案[J]. 计算机应用, 2020, 40(12): 3695-3700.
[10]	杨建喜, 张媛利, 蒋华, 朱晓辰. 边缘计算中基于深度Q网络的物理层假冒攻击检测方法[J]. 计算机应用, 2020, 40(11): 3229-3235.
[11]	任娜, 张楠, 崔妍, 张融雪, 庞新富. 面向无人机电力巡检的语义实体构建及航迹控制方法[J]. 计算机应用, 2020, 40(10): 3095-3100.
[12]	陈佳沣, 滕冲. 基于强化学习的实体关系联合抽取模型[J]. 计算机应用, 2019, 39(7): 1918-1924.
[13]	王甜甜, 于双元, 徐保民. 基于策略梯度算法的工作量证明中挖矿困境研究[J]. 计算机应用, 2019, 39(5): 1336-1342.
[14]	舒凌洲, 吴佳, 王晨. 基于深度强化学习的城市交通信号控制算法[J]. 计算机应用, 2019, 39(5): 1495-1499.
[15]	沙宗轩, 薛菲, 朱杰. 基于并行强化学习的云机器人任务调度策略[J]. 计算机应用, 2019, 39(2): 501-508.